Make atomic updates on ETS key

Fl4m3Ph03n1x · May 6, 2019, 2:41pm

Background

I have an ETS table where several processes can write concurrently. This table is very very requested, so having a GenServer as a gatekeeper serializing writes is not an option (it can’t hold the load).

Problem

The solution to this, given the GenServer limitations, would be to have each write be atomic. Now, I am aware of update_counter, but there is one problem. My ETS table doesn’t save counters, it saves lists of urls:

  def save_failed_request(url) do
    urls = :ets.lookup(__MODULE__, :failed_urls)

    case urls do
      []  ->
        :ets.insert(__MODULE__, {:failed_urls, [url]})

      [failed_urls: list] ->
        new_list = [url] ++ list
        :ets.insert(__MODULE__, {:failed_urls, new_list})
    end

    {:ok, :saved}
  end

This is the code I currently have to save an URL. Since multiple processes can be calling this function concurrently, multiple processes can be writing in the ETS table under the same key, which is not good.

Question

Is there any ETS function that allows me to make atomic updates on values stored in the table?
Or do I have to use counters ? (because the only atomic operation that exists for updates is update_counter?

LostKobrakai · May 6, 2019, 2:49pm

Few questions:

What’s the reasoning for just using a single key? Usually it’s preferable to split the data over many keys in ets, which makes them easier to query.
Is this simply an append and be fine storage? If so simply try to insert and let it fail if an url is already existing (would need refactoring away from using a list though).

Fl4m3Ph03n1x · May 6, 2019, 2:59pm

I am not interested in querying anything. All I want this table to do is to save URLs whose requests failed. Then, when the time comes, the outside process will query the table for all URLs (always all) and I just returns them and clear the table.

So, 1 key is all I need.

I don;t quite understand. The :failed_urls key is supposed to have a list of urls. Are you suggesting I use a :bag ETS table instead of a set and then just have multiple URLs saved there?

Could you elaborate?

peerreynders · May 6, 2019, 3:17pm

Try

insert_new(tab, {url,url})

for each url - later foldl over the entire table.

Fl4m3Ph03n1x · May 6, 2019, 3:19pm

I was actually thinking about using http://erlang.org/doc/man/ets.html#take-2 to get everything with the same key and remove it in one go

LostKobrakai · May 6, 2019, 3:20pm

:ets.delete_all_objects(tab)

Fl4m3Ph03n1x · May 6, 2019, 3:21pm

delete_all_objects doesn’t actually return them, which is something I want in addition to deleting them

LostKobrakai · May 6, 2019, 3:30pm

Then is really going to be a :bag

tab = :ets.new(name, [:bag])
:ets.insert(tab, {:failed, url1})
:ets.insert(tab, {:failed, url2})
failed = :ets.take(tab, :failed) |> Keyword.values()
# [url1, url2]

peerreynders · May 6, 2019, 4:05pm

:bag comes with the implementation overhead of checking for duplicates on insert; so unless you explicitly need to prevent duplicate full records, you should use :duplicate_bag over :bag .

i.e. may be worth dealing with duplicates after take/2

Fl4m3Ph03n1x · May 7, 2019, 7:47am

Could you elaborate ?

LostKobrakai · May 7, 2019, 7:48am

It might be better for performance to us :duplicate_bag, which is mentioned to be faster on inserts, and deduplicate the urls after using take/2.

Fl4m3Ph03n1x · May 7, 2019, 7:50am

Ahh, so we would be trading a faster insertion for a slower retrieval of information. I get the idea !

peerreynders · May 7, 2019, 3:47pm

Precedent: https://youtu.be/8mXqxBBvNdk?t=1082

Possible problem: Deleting entries (e.g. take) in ETS can be potentially slow (relatively speaking), it’s faster to drop the table and recreate it.

Solution tradeoff: insert into table can fail temporarily. So errors must be caught and insert must be retried later - this can be potentially delegated to a short-lived task process if inserting process cannot afford to be blocked.

rename table with rename/2
recreate new table under common, known name
do with the renamed table what needs to be done - for example, entire contents can be retrieved with :ets.tab2list/1, :ets.match_object(tid, {'$0', '$1'}), (don’t know performance relative to :ets.lookup(tid, :failed)).
when done delete/1 renamed table

ets-playground
cxy_cache.erl