Background
I have an ETS table where several processes can write concurrently. This table is very very requested, so having a GenServer as a gatekeeper serializing writes is not an option (it can’t hold the load).
Problem
The solution to this, given the GenServer limitations, would be to have each write be atomic. Now, I am aware of update_counter
, but there is one problem. My ETS table doesn’t save counters, it saves lists of urls:
def save_failed_request(url) do
urls = :ets.lookup(__MODULE__, :failed_urls)
case urls do
[] ->
:ets.insert(__MODULE__, {:failed_urls, [url]})
[failed_urls: list] ->
new_list = [url] ++ list
:ets.insert(__MODULE__, {:failed_urls, new_list})
end
{:ok, :saved}
end
This is the code I currently have to save an URL. Since multiple processes can be calling this function concurrently, multiple processes can be writing in the ETS table under the same key, which is not good.
Question
Is there any ETS function that allows me to make atomic updates on values stored in the table?
Or do I have to use counters ? (because the only atomic operation that exists for updates is update_counter
?
I am not interested in querying anything. All I want this table to do is to save URLs whose requests failed. Then, when the time comes, the outside process will query the table for all URLs (always all) and I just returns them and clear the table.
So, 1 key is all I need.
I don;t quite understand. The :failed_urls
key is supposed to have a list of urls. Are you suggesting I use a :bag
ETS table instead of a set
and then just have multiple URLs saved there?
Could you elaborate?
Try
insert_new(tab, {url,url})
for each url
- later foldl
over the entire table.
I was actually thinking about using http://erlang.org/doc/man/ets.html#take-2
to get everything with the same key and remove it in one go
:ets.delete_all_objects(tab)
delete_all_objects
doesn’t actually return them, which is something I want in addition to deleting them
Then is really going to be a :bag
tab = :ets.new(name, [:bag])
:ets.insert(tab, {:failed, url1})
:ets.insert(tab, {:failed, url2})
failed = :ets.take(tab, :failed) |> Keyword.values()
# [url1, url2]
1 Like
:bag
comes with the implementation overhead of checking for duplicates on insert; so unless you explicitly need to prevent duplicate full records, you should use :duplicate_bag
over :bag
.
i.e. may be worth dealing with duplicates after take/2
3 Likes
It might be better for performance to us :duplicate_bag
, which is mentioned to be faster on inserts, and deduplicate the urls after using take/2
.
2 Likes
Ahh, so we would be trading a faster insertion for a slower retrieval of information. I get the idea !
Precedent: https://youtu.be/8mXqxBBvNdk?t=1082
Possible problem: Deleting entries (e.g. take
) in ETS can be potentially slow (relatively speaking), it’s faster to drop the table and recreate it.
Solution tradeoff: insert into table can fail temporarily. So errors must be caught and insert must be retried later - this can be potentially delegated to a short-lived task process if inserting process cannot afford to be blocked.
- rename table with
rename/2
- recreate new table under common, known name
- do with the renamed table what needs to be done - for example, entire contents can be retrieved with
:ets.tab2list/1
, :ets.match_object(tid, {'$0', '$1'})
, (don’t know performance relative to :ets.lookup(tid, :failed)
).
- when done
delete/1
renamed table
ets-playground
cxy_cache.erl
2 Likes