What are the message delivery guarantees of Elixir when sending messages between two distributed Processes
? I couldn’t find a clear answer but I think it is at-most-once, is this correct?
Exactly once assuming that receiver is alive when message arrives to the remote node.
Here’s a very relevant article.
Applicable section:
Sending messages to processes on other Erlang nodes works in the same way, albeit there’s now a risk of messages being lost in transit. Messages are guaranteed to be delivered as long as the distribution link between the nodes is active, but it gets tricky when the link goes down.
I read that article before, but it remains a bit vague by saying “it gets tricky when the link goes down”. To me this sounds like it is possible that messages get lost when there is a disruption of the distribution link. Is that correct?
Can the message get lost in transit? For example, being sent from one node but never arriving at the other node due to something going wrong in between.
Regular TCP rules apply there. So if the message is lost it will be retried as per standard TCP rules.
It’s worth noting that send
doesn’t make any guarantees that the pid you are sending a message to is alive. If you want acknowledgements you’ll need to wait to get an ack
message back. The mechanics that @hauleth highlights are still relevant as far as the mechanics of distributed messaging, but even in a non distributed context just that send
returned doesn’t mean that anybody handled your message.
Considering disconnection and messages being lost:
As you may have already found out, runtime does not return any information to the sender about whether message was sent or not. You don’t need distribution to see how it works
iex> pid = spawn fn -> raise "I am so dead" end
iex> # Wait for exception
iex> send(pid, :anything)
:anything
So, like in any distributed system, sender must make sure that a message was received by awaiting on a response like:
iex> pid = spawn fn -> receive do: ({x, caller} -> send(caller, :received)) end
iex> send(pid, {:something, self()})
iex> receive do
...> :received -> :ok
...> after 5_000 -> {:error, :no_response}
...> end
And this rule applies to literally any distributed system (even to parallel programs on a single machine). For further explanations on how this works check out GenServer.call
documentation and gen_server:call
implementation
Considering other guarantees
-
send
either sends one message, either sends nothing. There is no automatic retry and there is no check if the sender is even alive. However, it is guaranteed that one send will never result in two or more messages in message queue of the receiver -
send(:x) ; send(:y)
if both messages are successfully sent, the:x
will be the first in message queue of the receiver and the:y
will be the second in queue