Let’s assume we have a set of microservices running in containers, communicating in gRPC or REST.
Am I right to think that, as long as they’re all coded in Elixir, making a cluster out of these services could potentially be simpler and more efficient than the protocols they’re using at the moment?
It will be hard to tell if Distributed Erlang will be more efficient, as right now it requires persistent connections between each pair of nodes (it can change if someone ever implement Sistributed Erlang over QUIC), so in large cluster there can be a lot connections.
Another misconception in your post is that not all of the nodes need to be written in Elixir, not even Erlang, not even any other BEAM language. In theory nothing prevents you from building Distributed Erlang cluster using any set of languages you want via C Nodes or manually implementing Distribution Protocol. There are some existing projects that does that. So in the end - you can have whole cluster that uses Distributed Erlang, and not single node that runs BEAM in it.
Potentially, yes. Out of the box, no. At work we have a system where someone tried to be clever and made a heterogeneous service mesh using erlangs built in node discovery methods. I dont love it.
In particular, it’s rather hard to trace data flow across the node boundary (this could probably be proved if we were better at Telemetry), and there are some distributed systems failure modes that you have to think hard about (probably we should have made things genserver calls instead of rpcs). Finally, there isn’t really a good “opinionated” way of testing.
If these issues were solved, it would probably be much better.
So I would generally only recommend distributed erlang used as a failover and reliability strategy with symmetric nodes instead of a service discovery strategy.
It is one way of doing it, but usually people who build a system via microservices do it to give different teams independence, and introduce strict boundaries between services, so using distributed Erlang for communication rather defeats the point. Might as well make it a monolith or umbrella app.
It can work for tiers or specialized nodes within an app though, if you don’t mind the security implications (compromising any node gives access to all the nodes). The book “Designing for Scalability with Erlang/OTP” covers that in chapter 13, “Distributed Architectures”.
You can take a look at Meshx - service mesh architecture.
Inter-node communication with custom binary RPC protocol as close to bare metal as possible. Transport over service mesh data plane with mTLS, LB, HA, ACL, etc.
one thing to consider is that erlang’s internal distribution protocol is very bare bones. your only tool is a raw erlang rpc call. you lose all of the features you get from http or grpc or some other rpc oriented protocol. you’re also limited to a single tcp connection per node which can cause issues at high throughput