Query to ETS and compare a data in a map

shahryarjb · June 27, 2022, 6:10am

Hi friends, I have an ETS table to store my user token, so I need to send request to all record and reject the token which are expired.

Example

[
  {:"fa7e1d90-beee-4058-9a3e-c78eda344e71",
   "e17705be-ef53-4755-9a43-01b2f5c3aa89",
   %{
     access_expires_in: 1661548398,
     create_time: ~U[2022-06-26 21:13:57Z],
     last_used: ~U[2022-06-26 21:13:57Z],
     os: "linux",
     rel: nil,
     token: "USER_TOKEN",
     token_id: "fa7e1d90-beee-4058-9a3e-c78eda344e71",
     type: "refresh"
   }}
]

The first thing comes to my mind is to load all records and compare their expiry time, I do not think it is a good way.

Another way I could not be able to use it is using ETS.Set.match_delete

I can use something like this:

 ETS.Set.match_delete(table(), {:_, user_id, %{rel: user_token.token_id}})

But how can get the record access_expires_in and compare in match_delete pattern

something like this which I can not use

ETS.Set.match_object(table(), {:"$1", user_id, :"$3".access_expires_in < DateTime.utc_now()})

I tried to use something like this:

ms = :ets.fun2ms(fn({key, user_id, token_info}) when token_info.access_expires_in < DateTime.utc_now() -> t end)

But I can not use DateTime.utc_now() in elixir guard

It should be noted I tested like this:

  def delete_expire_token() do
    time = DateTime.utc_now() |> DateTime.to_unix()
    pattern = {{:"$1", :"$2", :"$3"}, [{:>, {:map_get, :access_expires_in, :"$3"}, time}],[:"$3"]}
    ETS.Set.match_object(table(), pattern)
  end

But always I get {:ok, []}

Thank you in advance

Update

It works for

def delete_expire_token() do
    time = DateTime.utc_now() |> DateTime.to_unix()
    pattern = [{{:"$1", :"$2", :"$3"}, [{:<, {:map_get, :access_expires_in, :"$3"}, time}],[:"$3"]}]
    ETS.Set.select(table(), pattern) 
  end

And for `ETS.Set.select_delete(table(), pattern)` it should be like this:

def delete_expire_token() do
    time = DateTime.utc_now() |> DateTime.to_unix()
    pattern = [{{:"$1", :"$2", :"$3"}, [{:>, {:map_get, :access_expires_in, :"$3"}, time}],[true]}]
    ETS.Set.select_delete(table(), pattern)
  end

LostKobrakai · June 27, 2022, 7:08am

As you seem to have figured out: :ets.select* apis work with full match_specifications (consisting of match head, guards and return value) while :ets.match* apis work with match heads only. It’s comparable to filtering/limiting function parameters with pattern matching, guards and function body transformation vs. doing it with just pattern matching.

shahryarjb · June 27, 2022, 7:14am

Thank you @LostKobrakai, your comments in each of my posts have always been useful and Informative to me.

shahryarjb · June 28, 2022, 6:10am

Please consider we have 3 items with a duplicated key, and we want to delete one record of this key

for example, storing user_token with user_id key

{user_id_one, token1}
{user_id_one, token2}
{user_id_one, token3}

I think ‍match_delete costs less for the system than select_delete, Am I right?

ETS.Bag.add!(table(), {user_id, user_token.token_info.token, user_token.token_info})
ETS.Bag.match_delete(table(), {user_id, "token2", :_})

I just want to delete "token2" of the records.

For top example, how can I test and find the best way to select better way. let’s update "token3", it is better to delete all the token and add them again with new parameter, or delete one token and add new one.
I am saying this because we have lookup which using user_id as key, and I think is faster because do not use pattern and key is special for system, isn’t it?

LostKobrakai · June 28, 2022, 8:16am

If you know the complete key then yes lookup is faster. I don’t expec there to be a difference between match and select in terms of performance. Both need to search the whole table for all matches.

APB9785 · June 28, 2022, 1:35pm

As long as the primary key is given as part of the match pattern, it doesn’t need to search the whole table. From the match/2 docs:

If the key is specified in the pattern, the match is very efficient. If the key is not specified, that is, if it is a variable or an underscore, the entire table must be searched.

LostKobrakai · June 28, 2022, 1:43pm

I guess more correctly it would be: You know the key – great it’s fast. You don’t know the key – yeah we need to search.

shahryarjb · June 29, 2022, 8:49am

Hi,
When we have duplicated key like this:

{user_id_one, token1}
{user_id_one, token2}
{user_id_one, token3}

When we want to update {user_id_one, token2} to {user_id_one, token2 + 1}, we should delete all the records which were created by the key and add the records with new changes?

With read_concurrency: true write_concurrency: true, flag in public mod, Race Conditions can be possible? If yes, what we should do?

When we should use decentralized_counters flag for our table, I have read this Decentralized ETS Counters for Better Scalability - Erlang/OTP, but unfortunately I could not understand shortly!

APB9785 · June 29, 2022, 2:04pm

No. You’ll want to use ETS.Bag.match_delete(bag, {user_id_one, token2} to delete the exact tuple, then replace it with ETS.Bag.add(bag, {user_id_one, token2 + 1}.
Race conditions are possible even when the concurrencies are false. You must use GenServer or some other middleman process to ensure data is serialized and/or linearized according to your needs. I recommend keeping the table private so only one process is allowed to write to it. The one thing to keep in mind with those concurrency flags, is that they can make performance better or worse, based on whether your reads are interspersed with your writes. For example:

read, read, write, write, read, write, read, write, read, read

Would be a bad case for concurrency because what happens is the table starts in read mode and reads very quickly, because the first two reads can happen at the same time. But then when it gets to the write task, it has to switch OFF of read mode and switch ON write mode to perform the task. And the next two writes will be fast because they happen concurrently, but the time it takes to switch from read mode to write mode and back and forth over and over again, that time adds up, and makes performance even worse than if you had concurrency disabled.

Now, if you instead had a system where reads and writes are batched and handled together in large groups:

read x5000, write x1000, read x5000, write x1000....

Now switching between read mode and write mode becomes negligible compared to the speed increase that concurrency will bring while handling a batch.

Decentralized counters is another tradeoff based on your use case(s). Every ETS table needs to store some metadata regarding how many rows it holds, and its memory footprint. Usually this metadata is stored in a way where it is easy to read by calling :ets.info/1 or :ets.info/2. However, if you don’t ever need to read this metadata, there is a tradeoff you can make. With decentralized_counters, the metadata is instead split into several different locations, making it take much longer to read, because it will have to read the data from each location and then aggregate it. The up-side comes in because this metadata also needs to be updated when you insert or delete rows, and it is quicker to update the data when it is spread around. So this can give a slight performance increase for inserts and deletes, but it’s only good if you’re not using the :ets.info function. (Note: the Elixir ETS library uses :ets.info for the wrap_existing/1 function)

shahryarjb · June 29, 2022, 5:29pm

I appreciate you very much

Would you mind explaining about this part and show some example please? Because I changed my Genserver to ETS because I have concurrent request to the tokens. Because in this forum I see some posts, Genserver is single process and in concurrent read and write it is going to be bottleneck.
I start each ETS with a Genserver. Based on this article, Optimizing Your Elixir and Phoenix projects with ETS - DockYard
I can use Oban for it, but I need a user to delete a token, he/her can be able to access Immediately.

I am very confused in these days I just want to create a ram state for users to get token, delete it and add new one

APB9785 · June 29, 2022, 5:50pm

I’m just saying that it’s not the ETS table which will prevent race conditions, but the GenServer which starts it and writes to it. It only handles one message at a time, so you can ensure serializability. If you allow clients to update the ETS table directly, you could end up with conflicting updates corrupting the data. Instead the clients should send messages to the owner (the GenServer) and the owner will perform the writes one at a time.

I am very confused in these days I just want to create a ram state for users to get token, delete it and add new one

ETS is a good choice, especially if the table will grow very large. Don’t worry about the concurrency options yet. You will probably be just fine with the default options.

APB9785 · June 29, 2022, 6:02pm

An example would be

defmodule Manager do
  use GenServer
  ...
  def init(_) do
    # start ETS table
  end

  def handle_call({:write, user_id, token}, _, state) do
    Context.write_to_table(user_id, token)
    {:reply, :ok, state}
  end

  def write(user_id, token) do
    GenServer.call(__MODULE__, {:write, user_id, token})
  end
end

and then using Manager.write/2 instead of Context.write_to_table/2 directly. The Context function would of course have your ETS delete/insert/lookup etc. logic

shahryarjb · June 29, 2022, 6:16pm

Ohh, Thank you very much, you made many things clear for me

For this specific part, you said a Genserver can handle many requests from different users (for example, in phoenix controller we call Manager.write) ? Or it is going to be a bottleneck and does not response and users get timeout??

I mean:

  def handle_call({:write, user_id, token}, _, state) do
    Context.write_to_table(user_id, token)
    {:reply, :ok, state}
  end

  def write(user_id, token) do
    GenServer.call(__MODULE__, {:write, user_id, token})
  end

APB9785 · June 29, 2022, 7:54pm

Yes, you should be fine. ETS benchmarks show more than 1 million writes per second with a setup like this.

shahryarjb · June 29, 2022, 7:56pm

Oh yeah, but for learning elixir, I wanted to ask top question about Genserver. I always tried to find that answer.

APB9785 · June 29, 2022, 8:06pm

What question do you have about GenServer?

shahryarjb · June 29, 2022, 8:10pm

Imagine I do not want to use Genserver state. My question is not about ets. Just consider Genserver please

So you have 20k concurrent users to send requests like this: Manager.write(user_id, token) in your phoenix controller or etc. Does Genserver have a problem to do these 20 orders for users?

defmodule Manager do
  use GenServer
  ...
  def init(_) do
    ...
  end

  def handle_call({:write, user_id, token}, _, state) do
    MyDB.write(user_id, token)
    {:reply, :ok, state}
  end

  def write(user_id, token) do
    GenServer.call(__MODULE__, {:write, user_id, token})
  end
end

Or even to do like 1+1 instead of database storing, I mean the genserver can handle the requests or not?

APB9785 · June 29, 2022, 10:11pm

I don’t think there should be any issue at that order of magnitude. A benchmark on my machine with your 1 + 1 example shows a GenServer handling ~680,000 calls per second.

shahryarjb · June 30, 2022, 5:15am

So I misunderstood what was said in this post from the beginning. He meant the use of the state has problem, not Genserver callback directly.

How you test and benchmark something like this?

APB9785 · June 30, 2022, 1:32pm

With Benchee you can define the task (like sending 1000 calls to a running GenServer), and it will tell you how many times per second it completes.