What to use to rehydrate process state

wolf4earth · January 27, 2021, 2:51pm

We have a stateful Elixir service at work running in a Kubernetes cluster which keeps some user related ephemeral data in processes (think of it as a session process). At the moment we lose this state on a redeployment, which hasn’t been an issue yet. We do plan to keep additional information in there which is why we now want to investigate on how to keep this state around beyond a deployment.

The current idea is to run multiple instances of the app and also sync the process state into an in-memory data store. If a node then goes down - for example due to a deployment - new processes can be spun up on other nodes which rehydrate their state through this synced state.

I’ve spent some time looking into our options:

mnesia
riak_core
redis

mnesia and riak_core both have the charm that they run alongside our application without needing to spin up something separate like in the case of redis. Now I have some thoughts on each of these options:

mnesia

From what I’ve read so far mnesia seems like a solid choice as it’s comes with OTP but has one caveat: it has no built-in support for handling split-brain scenarios.

Since we’re running this app in a Kubernetes cluster and not on a telephone switch with a shared backplane, split-brain scenarios are not something we can ignore. This is not necessarily a deal breaker for us but it makes the next contender much more interesting.

riak_core

My current understanding is that riak_core is what powers riak KV, riak TS etc… It’s also better equipped to handle a split-brain scenario compared to mnesia is.

What has been confusing me though is: how the hell do I use it?

there is the “official” riak_core repository which seems to under active development but can’t be found on hex
there’s also a fork (riak_core_ng) which can be found on hex but development seems to be stalled

I’ve tried to install riak_core_ng but immediately ran into issues as it depends on a fork of poolboy (version spec ~> 0.8.4) and we’re using the “real” poolboy at version 1.5.2. We’re also running on OTP 23 which I expect to give us trouble as the latest release of riak_core_ng was mid 2018.

redis

Last but not least. Redis is a pretty straight-forward choice but kinda makes me sad. We can’t run it inside the same BEAM instance and it would require encoding and decoding our state (which should be manageable with erlang:term_to_binary/1 but still).

Right now it seems like the simple choice though, and I like simple.

I’d be interested to hear your perspective on this, big bonus when you have actual prod experience to back it up.

dimitarvp · January 27, 2021, 4:56pm

I’d lean to Redis. Yes, it incurs some little extra overhead but my thinking would go like this: since you are already using Kubernetes, spinning one more container (that could actually be useful for the state of other services in the future) isn’t such a big deal and Redis itself is quite standard tool for these scenarios.

I’d usually lean all the way to Erlang/Elixir software when I am coding a deployment monolith (although in your case I’d likely go for something even simpler like CubDB) but when k8s is involved you might as well go for standard industry practices.

wolf4earth · January 28, 2021, 8:19am

That’s interesting. From a quick glance CubDB seems to have no interest in offering a distribution solution, is that right?

LostKobrakai · January 28, 2021, 8:32am

This one might be interesting to you as well

dimitarvp · January 28, 2021, 8:43am

No, it doesn’t. It’s more like a per-node pure Elixir mini DB.

wolf4earth · January 28, 2021, 9:53am

That does look interesting but I feel like it’s a bit more frameworky than what I’m looking for. Thanks for the suggestion nonetheless!

I did some more research and came across DeltaCRDT which is what Horde is using under the hood.

Any experience with/opinion on that, @LostKobrakai/@dimitarvp?

dimitarvp · January 28, 2021, 11:33am

I am afraid I can’t give you a very informed take. I used DeltaCRDT once in a hobby project and I liked it a lot; that being said, I’ve heard people being displeased with CRDTs in general and that you still have to resolve some conflicts manually (imagine if two clients without internet made slightly conflicting changes and then both tried to synchronize them after they got back online).

However, I was left with the impression (after I tinkered on my hobby project for a while) that the CRDTs are very applicable in many areas. It just seems that people want to try them in scenarios that are either impossible (parallel conflicting changes) or not meant for them (offline-first apps).

But again, take this with a big spoon of salt. I am not an expert, just a guy who decided to see what’s this thing so many people talked about. They don’t seem very applicable to decentralized / federated scenarios is all. I liked using CRDTs and I can see applications but can’t talk informed about subtler nuances.

Qqwy · January 28, 2021, 11:59am

There is the library unsplit to make resolving split-brain scenarios easier.
It was never released on Hex.PM and it is quite old but I do believe it is used in a couple of serious applications/libraries. As one example that comes to mind, the user-authentication library Pow uses it.

wolf4earth · January 28, 2021, 12:38pm

That’s what I was implicitly referring too when I wrote that “it’s not a dealbreaker”. There’s also reunion.

I haven’t looked at any of those in detail though, as I first wanted to check other options which seemed more promising at the time.

bottlenecked · January 28, 2021, 10:02pm

There might also be another approach to this that does not involve distributed state but rather leans on the more exotic features of the BEAM: hot code upgrades.

Of course they’re more finicky to get right than regular deployments, but if there are only a few processes that you need to handle state updates for it might be doable- provided that you don’t care about accessing the state from all nodes but only care about keeping the state alive on the node that is about to be updated.

Taking this train of thought a bit further while also playing to kubernetes’ strengths (because allegedly hot code upgrades are not a scenario that ephemeral containers are made for), you could consider deploying a sidecar container inside the app’s pod that could be responsible for polling for newer versions and writing them on a shared volume and then trigger the hot code upgrade from that second container again.

Phew… just writing all these makes one realize how complicated the whole process can get …but if you do need persistent state across deployments and would rather tackle hot code upgrades than distribution, then perhaps it’s another option to consider as well.

chulkilee · January 29, 2021, 10:31am

Some notes from a few prod services running elixir on k8s

If you need some data after restart, those data won’t be ephemeral anymore, and you have to deal with it as like other persistent data, but probably with different performance/reliability requirement in this case. You mentioned you need to handle split-brain scenarios - then how are you handling the split-brain scenario with other persistent data?

The easiest way is to use the same persistent data (e.g. database for example). This is good unless the “less” important data causes bottleneck. In that case, you may just run a separate instance of the same kind of storage - instead of introducing new kind of storage.

You mentioned Redis - that’s a good choice (super easy to run). However introducing new type of external system needs good justification - for both app developer and infra management.

One of my projects actually stores all data in the database. We had to make sure new cluster should not start the job before existing cluster stopped and pushed all data into the database. With some work, if you carefully divide features, you can actually achieve zero-down time with some degraded performance on some features requiring pass-over the data.

In another project - I simply use oban background job processing instead of running a long-running process and worrying about redeployment. This project has relatively low volume so the choice was obvious, and it works great.

Oh, and yes, we also hit the network split in the k8s which led to split erlang cluster I highly recommend to just stick with simple centralized storage, instead of baking some tools inside your app erlang cluster. You should avoid adding more responsibility and requirement to your application when it’s not the core job.

If you move to erlang cluster wth hot-code reloading, some problems will become very easy to solve - but you also need to solve new problems, which are already covered by k8s and your existing k8s setup (such as CI/CD). If your app’s scale and challenges can justify the investment, that would be great. In my case this was not an option.

cehlts · February 1, 2021, 12:03pm

That is actually an interesting question, how to handle state.

If you really need session-data to persist during reboot, it is not ephemeral anymore. That is neither good nor bad, but you need to worry about state
Running multiple instances accessing given state adds distribution to the list of your problems

I would take a look at an event-based state-handling. Since it is based on messaging, it fits rather perfectly into erlang. Upside is, every node keeps its own database, would suggest mnesia in that case.

Merging state seems easier to me than handling concurrent access to coupled state

eteeselink · February 1, 2021, 6:55pm

Hi all,

I’m no Elixir devops guru, but I always thought that the classical BEAM answer to this problem is to not reboot at all, but to do hot code reloading.

Has that gone so far out of fashion that it’s considered a bad idea? If so, why? Something about hot code reloading itself, or is it just the assumptions of most deployment/orchestration software that “new version = restart”?

Qqwy · February 1, 2021, 8:26pm

I think the common answer is relatively straightforward. It consists of two parts:

Making sure an upgrade is ‘hot-code backwards compatible’ with the previous version can get very tricky/hairy. I remember to have read somewhere that a few companies that rely on hot upgrades were spending ~50% of their time on making sure the upgrades were correct (only being able to allot the other 50% to building new features). And of course some things cannot be hot-upgraded ever, no matter how hard you try.
Some examples: Erlang/OTP upgrades, Elixir upgrades, libraries that moved modules around in a way that resulted in splitting/merging of OTP applications, and of course kernel/OS (security) upgrades and the likes.
If there is data which should never be lost, you should not rely only on hot upgrades because power failures or other problems that make the BEAM or the whole PC (have to) restart do sometimes happen.

So hot upgrades are great to ensure that during most upgrades, sessions (like an ongoing phone call or websocket connection) do not have to be broken.
Combining this with using a cluster with multiple machines where you disconnect single machines one by one (once they have no sessions left) to perform a ‘cold’ upgrade on that machine and only then reconnect, allows you to have very high and consistent uptime for your cluster as a whole.

But it is not a tool that is useful if if a cluster setup like described above would be overkill, too expensive or for another reason not a good fit for your project,
and does not obviate the need to persist the important parts of your data.

tristan · February 2, 2021, 8:15pm

Yes, Erleans is more of a framework but because the issue requires a framework of a solution. Without the framework you not only have to handle the state storage but the process lifecycle within the cluster as well.

That doesn’t mean Erleans will fit your use-case, just saying its a problem that benefits from a framework, essentially a layer on top of the OTP framework.

Qqwy · February 2, 2021, 10:16pm

A while back (~2 years ago) I was considering an alternate approach that did work as a library rather than a framework.

I never fully finished the work on it, as my priorities had to shift unfortunately, but it is possible to write a custom process registry that will do the dehydration/rehydration of processes for you: PersistentGenServer on GitHub and its ElixirForum topic.

Do note that by itself it only tackles the hydration part and not the ‘distributed persistence’ part. But maybe it can provide as useful idea or building block for someone’s project.