Caching Strategies in Elixir for Microservices

This is a general water-cooler type of question for all the smart folks in this forum – thanks for any feedback and apologize if this is too wordy.

The problem: requests to /api/expensive responds too slowly.
The solution: caching. (Unless there’s some miracle alternative?).

One common approach implements a callback… i.e. “if cached result exists, return it, if not, then perform the callback: do the expensive operation, cache it, and then return it.” That’s fairly clean to implement, but I’ve seen that solution fail when lots of requests enter into the callback before the cached result is calculated. I’ve heard this condition referred to as a “cache slam”, and in PHP (and presumably with other languages too?), using a semaphore was required to lock and serialize requests in order to avoid it. Frequently, the database was the unavoidable “expensive” thing in this scenario.

One of the drawbacks of the above is that the code isn’t as clean… you can’t tell from looking at the route whether or not the results accurately reflect what is in the data model. You don’t know if you’re looking at cached data or the results directly from the database.

So, the next tweak is often to add an optional “refresh” parameter that triggers a fresh lookup. Although that works, you sacrifice even more clarity in your code.

In a super-clean/transparent API world of resources, I thought it might be a cleaner implementation if you implemented a cache service. In other words, /api/expensive ALWAYS performed that expensive operation (just as its name suggests). And a cache service could expose a route like /api/cache/expensive that would store the result of that operation. You could make POST/PUT operations to the cache service to add/update its contents, and there would never be any guessing – the cache endpoints would contain cached data, just as its name suggests.

Behind the scenes, we’d have to implement some message queue or callbacks to ensure that any changes to the data behind one of the expensive endpoints would cause the result to be stored in the cache service. Thoughts?

I’ve seen some discussion like this:

That’s really over my head, but is that a recommended solution to this problem?

I’m new to Elixir/Erlang/Phoenix, so I admit that my notions of caching are probably out of whack with what is idiomatic here, so I’d love some guidance and thoughts on how others have dealt with this problem. Many thanks!

I use the Cachex library for this.
I just set a fallback function that queries the database, set how often it should be considered expired (configurable key-by-key too if you need), set the janitor to sweep it every once in a while to clean up old stale entries, then I hit the cache as often as I want. It locks based on the key so concurrent access on a single key will pause on first access until it is resolved and cached then it returns them to all waiting, and others can be run as well at the same time. Then you can just use the Cachex api (I wrap it in case I ever change it of course) to always access that value and it returns it immediately via ETS if it exists else it locks and waits until the fallback function returns. And it even tells you in the result if it got it from the cache or if it had to acquire it as well if you need to know that too. You can purge any keys from the cache or all at any time you want, can set expirations for whatever makes sense for a given key, etc… etc…

It is single-node only though, so if you need to keep multiple nodes in ‘eventual sync’ then just send a (rpc?) message between them to clear that key on all nodes when you perform an update in the database (that’s what I do, eventual consistency in the range of a second is more than enough for my use for what I’m caching, otherwise I hit the database every time and use a materialized view to let the database do the caching of complex queries).

I’m not really a fan of stack overflow the more I see of it… >.>

4 Likes

The stack overflow responses are pretty much in line with common caching, though I would opt for standard GenServers instead of Agents-- they’re really straightforward to work with once you get the hang of them, and very general purpose for all sorts of business logic. All calls to a GenServer are processed in order, so you can’t really “cache slam” one.

I’m guessing your primary use case is complicated DB queries that need to be cached. This is probably overly simplified, but in a lot of cases you don’t actually need a database to hold your data. If your working data set can be held in RAM, throw it on a GenServer and then use the database as a cold-storage backup.

1 Like

In general I’d prefer Cachex instead of a GenServer since it is backed by :ets. That means that if the data is already cached (and fresh) then the reads don’t need to be serialized through a GenServer resulting in higher performance. Also I think there are a lot of correctness guarantees that are solved by Cachex already that you’d have to re-implement in a custom GenServer.

3 Likes

Cachex is great, don’t get me wrong, and I totally use it when I need a quick KV cache, but perhaps I didn’t fully clarify the point I was making. Treat a GenServer as your data store, complete with the necessary business logic, and you’ll probably find yourself needing a cache less often.

2 Likes

Yeah that is a great point. As an industry we’ve come to overly rely on caches when other constructs could potentially make more sense for the given problem.

2 Likes