Distributed Erlang: Control vs. Data

alchemist_ubi · August 10, 2021, 7:47am

Reading “Implementing Riak in Erlang: Benefits and challenges” http://gotocon.com/dl/goto-chicago-2014/Web/vinoski-impl-riak-erlang.pdf

Something got my attention,

I haven’t had the opportunity to fully comprehend this by actual work experience since most applications are not at Whatsapp or Riak scale.

Is that still true today, I know that the Erlang team has done huge improvement since 2014, but I am not sure where if this slice holds true today, or to what extent it does?

RudManusachi · August 10, 2021, 8:51am

I think it refers to the fact that “cross node” message passing is basically always copies the data. While inside the node we know that for example big binaries are stored in the shared heap and with the message to the process located in the same node only reference is passed, we cannot do that across nodes.

About multi-core erlang I have this paper in my bookmarks, but haven’t get to it yet personally: Characterizing the Scalability of Erlang VM on Many-core Processors

dom · August 10, 2021, 3:12pm

The problems with sending large terms over distribution were solved with this change in 2019:

github.com/erlang/otp

Implement fragmentation of large distribution messages

erlang:master ← garazdawi:lukas/erts/fragment-dist-messages

opened 02:35PM - 05 Feb 19 UTC

garazdawi

+3944 -1982

This PR implements fragmentation of large Erlang Distribution signals in order t…o prevent [head-of-line blocking](https://en.wikipedia.org/wiki/Head-of-line_blocking). This PR introduces two changes to the distribution protocol. 1. Move exit reason of EXIT, EXIT2 and MONITOR\_P\_EXIT to after the control message. 2. Introduce two new distribution headers that represent: 1. Start of a new sequence of fragments 2. A fragment in a sequence The new distribution headers look like this: | 1 | 1 | 8 | 8 | 1 | NumberOfAtomCacheRefs/2+1 \| 0 | N \| 0 | |-----|-----|-------|--------|-------------------------|--------------------------------|---------------| | 131 | 69 | SeqId | FragNo | NumberOfAtomCacheRefs | Flags | AtomCacheRefs | | 1 | 1 | 8 | 8 | |-----|-----|-------|--------| | 131 | 70 | SeqId | FragNo | * The atom cache is the same for the entire sequence. * The FragNo starts at the total number of fragments in the sequence and then decrements to 1, i.e. in a sequence of 2 fragments the start header has FragNo set to 2 and the following fragment has FragNo set to 1. * The old distribution header is still used for messages that do not need to be fragmented. The following restrictions exist when using the message fragmentation: * Only the payload of the message may be fragmented. The control sequence may not span across several fragments. * Only one sequence may be sent by one process at a time. * Fragments must arrive in the correct order. i.e. if a sequence consists of 4 fragments, then the fragments have to arrive as 4, 3, 2, 1. In addition to these changes to the Erlang Distribution protocol, this PR also fixes and optimizes many internal issues. * Yielding during processes exit when sending many exit/down messages * Change the distr inet driver to run in binary mode and fix dist.c to not copy the payload unneccisarily. * Trap when sending distributed exit/1, exit/2 and monitor down messages. NOTE: The documentation for the new distribution headers is not done yet.

In practice people send tons of data over distribution and it works fine nowadays. RabbitMQ for instance is a heavy user. Phoenix Pub/Sub as well.