The Soul of Erlang and Elixir - Saša Jurić - GOTO 2019

sasajuric · May 24, 2019, 11:52am

I responded to that question on the reddit thread, here’s a copy-paste of my answer:

I feel that there are a couple of shortcomings.

First, in my opinion distributed BEAM is mostly intended to run on a network which is fast and more reliable (such as local network). While in theory it can also work on a less reliable/slower network (e.g. geographically dispersed machines connected via Internet), in practice you might experience more frequent netsplits which could cause various problems, such as worse performance or less consistency.

Another issue, which should be fixed in Erlang/OTP 22, is the fact that sending large messages via distributed BEAM might also cause netsplits. Prior to OTP 22, people usually solved this by using a side-mechanism (e.g. HTTP requests) to send large messages.

Finally, I’ve seen various reports that the practical size limit of a BEAM cluster is in the range of 50-100 nodes. The reason for this is that BEAM cluster establishes a fully connected mesh (each node maintains a TCP connection to all other nodes), so at some size this starts to cause problems. As far as I know, the OTP team is working to improve this, but as of OTP 22 it is still not done.

Most importantly, I feel that the ecosystem is lacking easy-to-use, reliable, thoroughly tested, higher-level abstractions. There are various initiatives/explorations available, such as swarm, hoarde, or lasp/partisan, but I’m currently not convinced that any of these is mature enough to be used in production. That said, I think I saw a few mentions of people using some of these libs in production, so take my statement with a grain of salt