What happens to a process mailbox if a machine restarts

yarrichar · December 17, 2023, 10:36pm

I’m very new to Elixir, so sorry if this is a silly question:

I am wondering what happens to the mailbox of a process when the machine it’s running on dies / restarts / etc? I don’t want to lose messages obviously.

Are process mailboxes persistent, or is there some other way this is handled?

al2o3cr · December 17, 2023, 10:37pm

Nothing is done with it, it is only in memory as long as the process is running (and the node is alive, etc).

If you want durable messages / reliable delivery / etc, you’ll need to explicitly set that up.

smathy · December 17, 2023, 10:43pm

Just wanted to add/highlight that it’s not just when the machine restarts, it’s when the process ends for any reason.

yarrichar · December 17, 2023, 10:49pm

Oh, interesting, if the process crashes, and the supervisor restarts it, the mailbox is still lost?

yarrichar · December 17, 2023, 10:53pm

Thanks for the quick reply.

Are there any standard patterns or libraries for helping with this? I don’t want to re-invent the wheel if I don’t have to

My first thought is to:

save incoming, external, messages to the DB as soon as they’re received.
send a message to the process saying there’s something for them to do
the process pulls the next message from the db
the process marks it as done when complete.

This feels like it might be a bit heavy handed though

al2o3cr · December 17, 2023, 11:19pm

What specifically do you mean by “external” in this context?

This description needs more detail to narrow down the possible implementations:

Oban could be viewed as a version of this: an “external” message inserts a job, which is then routed to a worker process
so could Kafka, although the “send a message” part is replaced with a “check for messages” interaction
lots of others, from work-queues to durable append-only logs

Another challenge with introducing this kind of persistence is making sure that messages are replayable - for instance, GenServers are frequently used as “stateful containers” so you might see a sequence of messages like:

message 1, set up some initial state
message 2, do additional stuff with things created in message 1
message 3, do final work and clean up state from message 1 and message 2

Just saving messages that haven’t been handled isn’t enough, since sending a restarted server “message 3” won’t have the state set up in the first two messages.

rvirding · December 17, 2023, 11:32pm

I think you need to be aware of that when a process dies it completely goes away and everything in it like memory and messages is lost. A process cannot be restarted, ever! When we say a supervisor restarts a crashed process what we actually mean is that the supervisor creates a completely new process to take its place.

yarrichar · December 18, 2023, 12:01am

In my case I was wanting to add support for arbitrary workflows in my app, with each workflow being handled by a single process. So external just means any user request, or message from an external system that needs to be processed by a workflow / process.

Oban Pro actually sounds pretty similar to what I was thinking.

To use a queue as the durable store wouldn’t I basically need to have a queue per process? Since the process when it does a “check for messages” should receive only it’s messages. I think having another process in the middle, with the job of pulling messages from the queue and distributing them will introduce ordering issues if something goes wrong.

Yeah, agreed - I was planning on also getting the process to store it’s current state to the DB.

yarrichar · December 18, 2023, 12:08am

Ok, cool. Thanks for the reply.

Still getting my mental model right after too many years doing J2EE stuff

stwf · December 20, 2023, 1:45am

If you didn’t want to involve Oban and the db you can isolate things much better than sending a process a bunch of messages to a worker process.
Create one process that just holds the data list, if it’s simple enough it should never crash (lol) then do all of the processing inside a different process. It would take one event at a time, so even if it crashed you would lose that bad one at worst. You could even monitor the process and re add the data in case of a crash. But none of these would survive a machine reboot. For that you need to write it somewhere.
Postgres and Oban would work, as would pushing each message in SQS and letting Broadway pull them out. Each solution has its place depending upon your requirements.