BEAM ordering guarantees when processing messages

benjreinhart · November 12, 2021, 8:47pm

Hey all, I have a question about how messages will be processed when one or more of those messages is long-running.

To start with, let’s assume a single CPU core and scheduler thread for simplicity.

Let’s say that Process A has two messages in its queue, m1 and m2, with m1 at the head of the queue. Let’s also say that m1 will take around ~10,000 reductions, so ~2.5x the current allotted reduction count (4000). This means m1 will have to be scheduled and executed three times.

My question is: will m2 be scheduled and executed before m1 has fully completed? Or does the BEAM scheduler ensure that m1 has fully completed before processing other messages in the queue for that specific process?

I’m curious because if m1 is computing updated state based on an existing value, and m2 comes along and finishes first, there seems to be a race condition where m2’s state update would be overridden if it finished before m1.

kip · November 12, 2021, 8:54pm

Assuming you are referring to a :gen_server process (which is recommended in most cases) then yes, m1 will be fully processed before m2.

If you are hand-rolling your own receive loops then the behaviour is up to you (here be dragons).

benjreinhart · November 12, 2021, 9:07pm

I’m currently using Elixir’s Agent with use Agent in a module. I missed this before, but I just saw this from the Agent docs:

The first function blocks the agent. The second function copies all the state to the client and then executes the operation in the client. One aspect to consider is whether the data is large enough to require processing in the server, at least initially, or small enough to be sent to the client cheaply. Another factor is whether the data needs to be processed atomically: getting the state and calling do_something_expensive(state) outside of the agent means that the agent’s state can be updated in the meantime. This is specially important in case of updates as computing the new state in the client rather than in the server can lead to race conditions if multiple clients are trying to update the same state to different values.

So I think that answers it in this case. As long as the long running computation is occurring inside of the agent server then it’ll ensure that queued messages not only start in order, but they finish in order?

kip · November 12, 2021, 9:20pm

Yes, that is correct (Agent is a :gen_server under the hood). :gen_servers serialise message execution, thats part of the guarantee.