Cruz
How to avoid the performance impact of large ETS-based queries?
Hi,
I’m writing a service in which each transaction starts with the system returning a “catalog” of available products to each caller/client. There are more than 1K products and 11 different of these catalogs. Each product contains a series of IDs, name and descriptions in two languages, multiple flags, and other data. In other words, the size of each catalog is considerable.
Given the frequency in which the catalogs are used, I thought to load them into ETS tables for quick access. However, I found the following article:
https://medium.com/@jacob.lerche/using-constant-pools-to-speed-up-your-elixir-code-c527d533c941
The author suggests using a macro when the size of the data is too large for ETS. However, he doesn’t provide an idea as to how much is too much data. And, using macros implies having to recompile every time the data changes. This might be OK, but I prefer to avoid it if possible.
Does anyone has found this issue before? Do you know when the size of the data becomes an issue for ETS?
I plan to do some load testing with the ETS based solution. Any other suggestion?
Thank you
Marked As Solved
benwilson512
Hey @Cruz. Does each transaction load the entire catalog or just specific items within it? Specifically, the article is talking about how the copying penalty happens when the items you want to look up are themselves very large, not necessarily when the whole table is large. If the whole ets table is large but each item is small, and you only want a few items, those are the only items that are copied.
These days however if it’s a very constant sort of thing I’d look at using http://erlang.org/doc/man/persistent_term.html You get basically the same performance and copying characteristics without having to deal with macros.
Also Liked
jacoblerche
Hey there, author of the article here. Ben already gave very comprehensive answers, I’ll just add a few points as to when to use constant pools
- Large data that needs to be read by a lot of processes. If it’s just a handful of processes, you might be better off with ETS or something else, unless the data is gigantically large
- Data that still needs to be updated regularly, but infrequently compared to the reads
I should note that compilation of a module with static data is actually deceptively fast especially if you use Module.create/3.
I should also point out that binary data benefits from reference counting. IIRC, if the binary is 64 bytes or greater only a reference to it is copied over to a process. Just another thing to possibly consider if it fits your needs.
michalmuskala
I saw the mention of :persistent_term, so I’d like to underline that dynamically compiling modules with static data now that we have :persistent_term is in almost all of the scenarios going to be slower and will do more operations than using :persistent_term. I’d consider that technique to be largely obsolete now.







