16,000 nodes at WhatsApp - how?

When WhatsApp says 16,000 nodes, I struggle to understand the networking side of things?

How many clusters? How many nodes per cluster? What is the network topology?

I never run an application with more than a few nodes together, and I would love to learn how to embrace Erlang to this scale.

How do you do that?


Two videos related to how they did it :slight_smile:


I am sorry for the delay, I had watched the videos multiple times, and I keep getting confused and so many questions that I don’t know how to answer.

I am sorry if I ask things that you may understand how to answer from the video, I am trying to clarify my thoughts out loud to make sure I am not making a mistake.

From this picture, https://youtu.be/LJx6mUEFAqQ?t=201

I am assuming; they call frontend the exposed servers.
I am assuming; each box is an Erlang VM.
I am assuming each group of boxes is a cluster of the same OTP application.

Where I get confused is the connection between the Nodes and clusters.

How are they connecting their Nodes? Are they maybe using something like LibCluster or Peerage?

How are they connecting the clusters?

What do the arrows represent?

Do they represent connections at the VM level? Are they doing RPC calls using Erlang distribution?

If so, do they connect each Node from a cluster to each node from the other cluster?

These are the hardest question that I am not sure how to answer when it comes to connecting clusters to clusters.

I am sorry for the confusion and ignorance, I am getting overwhelmed understanding these Erlang concepts at a significant scale.

Most likely in a non-Erlang ecosystem,

These are a bunch of servers with proxies and low balancers, self-discovery systems, with a lot of networking tools doing the magic for me.

Where I do a call to whatever aliased service name, and I don’t know where that is nor how it is connected.

But I have a hard time understand how to leverage the Erlang platform fully.

Today, a single cluster using https://github.com/mrluc/peerage is as far as I got when it comes to multiple servers. I never had to connect two clusters before (too powerful for me to even try).

And questions of where do I put the low balancers if I am connecting cluster-to-cluster, throw me off. I don’t know enough.


Cluster to clusters, I used VPN connections in the past for it. Just remember its all the same access time obstacles when dealing with SIP and on a non-contiguous connected internet.