Data caching: Agents or ETS?

brightball · September 10, 2016, 11:17pm

Looking over OTP and I’ve been exploring data caching options a little bit (for a single node). Nothing fancy, but I’m just curious if it would be better to use the Agent per cached item pattern or just store it in ETS? Thoughts? Trade offs? Proper form?

sotojuan · September 11, 2016, 6:50pm

The Elixir guide says:

Warning! Don’t use ETS as a cache prematurely! Log and analyze your application performance and identify which parts are bottlenecks, so you know whether you should cache, and what you should cache. This chapter is merely an example of how ETS can be used, once you’ve determined the need.

http://elixir-lang.org/getting-started/mix-otp/ets.html

So start with an agent?

kaqqao · September 11, 2016, 6:54pm

Have a look at my thread from a while ago: GenServer use-cases
My question was two-fold, but one half of it was pretty much the same as this, and it got very good answers.

benwilson512 · September 11, 2016, 7:27pm

There are a few questions that you need to answer in order to know what the best route is.

What are you trying to cache?
How will it be accessed?
How is it updated or invalidated?

The most important question of all:

Does it really need cached?

Without answering these questions it is impossible to recommend the right approach.

brightball · September 11, 2016, 10:52pm

Mainly thinking of a slow API call or query when I asked the question

benwilson512 · September 12, 2016, 1:46am

If what you want is a plain old cache then :ets it is. https://hex.pm/packages/con_cache is a nice wrapper that focuses on the cache use case.

vasspilka · September 12, 2016, 8:23am

I think for cache ETS is better suited.

However I’ve read somewhere that caching in phoenix/ecto should be done only when absolutely necessary.
Meaning you should avoid doing caching prematurely. Unless you are doing really complex queries chances are that it is not needed.
Ecto is a LOT faster than Active Record therefor it is often faster without cache than AR is with cache.

I would suggest not doing any caching, if you come across any performance issues, you can see if maybe your query can be improved as most likely that is the cause of it. If the query is fine and optimized but slow (that should be in a small fraction of queries) then you can use ETS or some 3rd party package to do caching.

sasajuric · September 12, 2016, 9:48am

Agent is a simple solution that could work for smaller loads and a few client processes. ETS table should usually perform better, and can support concurrent clients, i.e. you could have simultaneous multiple readers/writers - something not possible with Agent/GenServer. It is however very limited in terms of atomic operations, so it’s mostly suitable for simple k-v stuff, and some concurrent counters.

Personally, if I know that there will be multiple clients of a key-value store, I just go for ETS immediately, because I believe this is what it was made for. That being said, some cases are in the grey area, so starting with a simple Agent is a somewhat simpler and more flexible solution. Assuming you encapsulate cache operations with some module, switching to ETS should be easy, because you’ll likely need to change the implementation in only one module (the cache wrapper).

Finally, as others have pointed out, think carefully whether you even need a cache. All other things being equal, cacheless is better than cacheful (because of less complexity), so if you can get away without it, it will be the simplest solution

rvirding · September 12, 2016, 2:05pm

Yes, ETS has basically no atomic operations, it is a data store not a data base. So if you use ETS and need atomicity you will probably need a process in front of it to handle interactions. You can, however, be cunning and mix atomic operations through the process with “dirty” reads directly from the ETS tables.

brightball · September 12, 2016, 2:30pm

Thanks for the answers. This is a situation where I don’t actually need a cache, I’m mainly exploring the best options to handle it if I get to a situation where I do need one so that I’m not coming at the problem unprepared.

OvermindDL1 · September 12, 2016, 2:44pm

I use the Cachex library for caching in my place (it is like con_cache but a bit more feature-full, specificaly it had one feature I wanted that con_cache did not have), and I only use it to hold permission information for a given user and only for a very short period of time (since there can be dozens of DB lookups otherwise across a few different processes for a single ‘request’). I’ve not seen any need to cache anything else yet, postgre is fast enough if well structured and ecto is wonderful.

sasajuric · September 12, 2016, 6:35pm

The main reason I wrote ConCache was to solve the following problem. I had to deal with a bunch of clients (~ 5k) which had to continuously fetch updates from ~50 producers. Now, each client can pick which producers are they interested in, so in theory each client receives a special combination of data.

Combining multiple updates from multiple producers was CPU intensive, so before the cache, my CPU usage was high. Now, in practice, only a few combinations were used by most people, so I went to profit from that and cache data as it is computed.

So say that you want to fetch a, b, and c, and I want to fetch the same combination. It will be computed for one of us and cached so all others will fetch the result straight from the cache. Even if we’re doing this at the same time, only one of us will run the computation. Finally, owing to TTL support, the cached data will be purged fairly quickly.

So I found cache useful to compute something on first demand and keep it around for a while, in case someone else needs it. As I recall, in this particular case I had some big perf improvements with cache. I don’t recall I used it for anything else though. So yes, there’s less need to do that in Elixir/Erlang world, but it can still useful occasionally.

Styx · September 15, 2016, 12:09pm

I think the main idea here is not to cache just if you think so. And IF you really need to cache (after profiling), then use ETS.

brightball · September 15, 2016, 1:10pm

Any suggestions for profiling tools?

Styx · September 15, 2016, 2:27pm

Haven’t used any, just because my programs too simple (: So I can’t recommend you anything but google.

yurko · September 15, 2016, 5:06pm

I’ve researched the profiling options recently, exometer / elixometer seems like a good option, there is also fluxter that connects the app to the influx stack. Here are few links:

starbeast · January 31, 2017, 3:08pm

In my case I was using ETS (wrapped by ConCache due to necessity of having update locks), but there come another question due to my use case: there are activation and deactivation events for specific objects - when activation/deactivation event occurs I update a key holding list of currently active objects - but I still don’t know what to do in case of ets getting down - machine reboot/table corruption/etc. - my solution is to dump periodically into dets file, but yet I don’t know how to tweak ConCache to fill its keys from dets dump on startup/restart. Should I use Mnesia in this case probably?

OvermindDL1 · January 31, 2017, 3:13pm

ConCache, ETS, CacheX (I used it over ConCache due to a few features that were useful to me) and so forth are, as most of their names imply, good to cache data. If you are storing data that should be serialized out then definitely should use something else, like Mnesia, a database, etc… I have ConCache in front of my permissions table on my database for example. I hit it to get a list of permissions for a user quite often and randomly through-out code, it has a time-out of 5 minutes and anytime a permission is changed for a user it is cleared, improved through-put quite nicely and reduced database access substantially.

sasajuric · January 31, 2017, 3:45pm

When I wrote ConCache, one need I had was to keep the data across restarts. Therefore, there is a :callback option where you can provide a function which is invoked when an item is modified or deleted (even if it’s deleted to to TTL expiry). See Callback section here for brief description.

The idea is to use this callback to persist each change to disk. Then during your application startup, you can read that data and prime the cache.

OvermindDL1 · January 31, 2017, 3:57pm

Heh, one of the few features of ConCache that I do not use since my usage is already backed by PostgreSQL. ^.^

Fantastic library though, thanks for it!