Why BEAM nodes try to connect to all other network nodes?

Hello,

Why does the BEAM try to connect each node to all other nodes from the network?

For example, let’s say I have nodes A, B, and C. A needs to talk with B, B needs to talk with A and C only talks with B:

To do this, I configure libcluster with a topology that looks like this:

A <-------> B <-------> C

What I expected was to A and B be connected, and B and C too, but actually A and C connect to each other too.

This is OK in my development environment, but in production, I have more strict firewall rules, so A cannot access C IP and vice versa.

Everything seems to work, but A and C still try to connect to each other, resulting in warnings like this on from my log:

2021-09-25 18:47:47.463 [warn] #PID<0.3141.0> 
↳ global: :"candles@node1.candles.tip-off" failed to connect to :"rocket_dbs@node1.rocket_dbs.tip-off

So, why does this is the default behavior of the BEAM? Can I disable/configure it? What are the advantages/disadvantages of it?

So one thing that it is important for is to obtain cluster-wide global transactional locks using the :global module. If every node knows every other node, then this is not a problem. If you have a more unusual topology, ensuring that all nodes are aware of the lock is not trivial; I don’t know what guarantees :global makes when you have an unusual topology.

I think if you have a situation where your nodes have a heterogeneous topology you should reconsider using erlang clustering as a “service mesh” or at least look into a different clustering protocol; the original use case for erlang clustering is for symmetrical redundancy, not as a service mesh.

There have been attempts to do so (e.g. “partisan”). I think that project is very interesting but I worry that the abstraction is not quite the right one.

1 Like

According to Erlang’s document,

Connections are by default transitive. If a node A connects to node B, and node B has a connection to node C, then node A also tries to connect to node C. This feature can be turned off by using the command-line flag -connect_all false, see the erl(1) manual page in ERTS.

and (as @ityonemo has mentioned)

-connect_all false

If this flag is present, global does not maintain a fully connected network of distributed Erlang nodes, and then global name registration cannot be used; see global(3).

So if you need to discover processes, it will have some issues.

On libcluster’s README, it says:

Features

  • Easy to use provide your own distribution plumbing (i.e. something other than Distributed Erlang), by implementing a small set of callbacks. This allows libcluster to support projects like Partisan.

I haven’t tried Partisan yet, but it works not in “all to all” mode, and it provides some alternatives to the :global registry and process discovery. As I understand one of its selling points is such networking conditions. But maybe you don’t need it since A and C won’t talk to each other?

4 Likes