GenServer Write-Through Cache (or Durable GenServers)

AdrianSchneider · December 27, 2017, 7:40am

I’m looking for the right way to model some sort of durable state, in a way that isn’t too repetitive. I really like the GenServer for dealing with a stateful server, but of course it doesn’t offer any mechanics for dealing with persistence.

What I’d like to do is offer a sort of write-through cache GenServer. Whenever we return new state, it gets written to persistent storage somewhere, reads are written from the process directly (it’s always live), and whenever the process reboots, it rebuilds its state from persistence storage.

Doing this manually is trivial… but I’d rather not couple each GenServer’s implementation to the persistence. So, thinking out loud, I can think of a few potential options:

extending GenServer (is this possible?)
wrapping with a supervisor + sibling to handle data fetching and initial loading

Any direction would be appreciated!

OvermindDL1 · December 27, 2017, 8:01am

A question, what if the persistant backend gets updated not through your cache, like if your system becomes distributed?

EDIT: For note, CacheX is such a cache, with explicit commands to dump/load the entire cache state as well (you can always do calls through the cache and have it auto-load/generate data when missing as well, it is not distributed though).

mbuhot · December 27, 2017, 10:11am

If using Postgres for persistence, you can set up triggers on the table/s to pg_notify when data changes, and use Postgrex.Notifications to send a message to your GenServer.

It has some nice properties like notifications are de-duplicated within a transaction, and only don’t get triggered if the transaction is aborted.

christhekeele · December 27, 2017, 10:19am

Not to toot my own cheroot, but I’m planning on devoting a lot of energy in the new year to my Mnemonix library, which will include a very versatile API to accomplish exactly this functionality in v1.1.0. Watch this space!

AdrianSchneider · December 27, 2017, 7:01pm

We’re still fine here… at least, the way it’s designed, it’s intended to be:

This system (all nodes) are the sole user of the datastore chosen (might be ETL, might be PG, I don’t care)
Each actor (preferably just ones that have registered names), should be able to bounce around the cluster, as long as they all have access to persistence to rebuild when restarted.

For example,

GenServer “stateful_service” boots for first time
It’s the first run, so it defaults to some known value as per its module def
As users interact with this process, it’s state is mutated, and is immediately written back to persistence before responding or acknowledging the write
All reads go through the process’s in-memory state, which is latest
Upon reboot, system failure, etc. it is reloaded and pulls from persistent storage
Latest state is live again

Trying to remain persistence-agnostic as much as possible… but that is an interesting approach for sure!

Cool, this would be good for centralizing the repository/persistence access.

To be more specific, I’m stuck on the GenServer integration aspect of this (lifecycle, extending GenServer code somehow, etc.) rather than talking to an actual persistence layer. The naive version might just add a persistence call in each callback that returns new state.

Thanks for the responses so far guys!

benperiton · January 31, 2018, 4:11pm

Hey,

Did you get any further with this?
I’ve just hit the same kind of issue, using a process as a ‘root entity’ for things, but I need to persist it as well.

My first pass was as you said, putting calls to save it directly in the callbacks. Im thinking maybe just a wrapper like call_with_newstate that calls the @entity_repository.persist() each time.

But that just seems a bit … ‘dirty’ … lol