Standard library for restoring process state and messages on restart

GuSuku · July 12, 2019, 4:17am

I am new to Elixir and have found some good resources (lie this one) on strategies for preserving state on restarting a failed process.

I would like to know what standard well-maintained libraries exist to help with this common need, before deciding to hand-rolling a custom one.

OvermindDL1 · July 12, 2019, 3:07pm

It’s all very situationally specific to what the process does and how it should handle failure, so if you have a specific example?

GuSuku · July 13, 2019, 3:22am

Thanks. An example would be where I have a state machine, instantiated one per user, reacting to events we are listening to for each user in the real world. In this case, when restoring failed processes tracking users, I want to lose neither the current state of that user nor the event queues (including the one being processes) waiting to be processed for that process.

This doesnt sound too specific or fancy. On the contrary, this sounds like something that would be a common requirement. Where am I wrong on this?

LostKobrakai · July 13, 2019, 4:35am

What if the current state or msg queue is the reason for the failure? One central requirement for the self healing properties of the beam is to always start a process into a known to work state, which you don‘t do if you keep state or messages around.

GuSuku · July 13, 2019, 4:51am

One (partial) solution would be to replay it on restart, only for a fixed number of times, after which the offset moves to the next message.

LostKobrakai · July 13, 2019, 5:06am

This requires the fact that you can safely discard messages, so it doesn’t invalidate the following messages.

Also keep in mind that a process might be restarted not because its own state/queue had problems within the process boundries, but it might be part of a bigger supervision tree, which doesn’t work with the current state and therefore some supervisor above the current process does initiate the restart. So to your process things can seems very fine, but if another process has problems with your process’s state you still need to have a way to resolve the problem.

tty · July 13, 2019, 5:11pm

IMHO this isn’t that much a common need and unfortunately the devil is in the details. I’ve had coded this maybe trice and each instance there wasn’t much commonality besides the grand idea of storing/restoring state.

I suspect you would be hand-rolling your own solution.