Are you using a clustered Elixir deployment?

chulkilee · September 10, 2020, 7:31am

We have a few services in Elixir

All run on docker
All run on k8s - except one (running on vm + docker + systemd - but we will move this to k8s too )
We don’t use hot code reloading at all - just using k8s way.
Some services are using erlang cluster, while some are just independent containers.

For services using erlang cluster:

We use libcluster to form a cluster from k8s dns, and it works very well
- A while ago, I tried to set up erlang clusters on vms across AWS EC2 instances with ansible, and it wasn’t simple… I don’t remember the details but probably due to epmd.
We need erlang clusters for global knowledge (e.g. websocket) and message passing between processes across nodes with minimal latency… and distributed erlang is a great feat.
Some application uses Horde for global registry, and we have an issue on network split (link) - but other than that it works well as it is designed.

For services not using erlang cluster:

To distribute async jobs - we use oban. This is our preferred choice if a service already requires postgresql. Having persistent, auditable records are often “required”.
- oban also have “unique” job and cron job features - so we don’t need to build global lock or global scheduler which is very nice.

On docker:

We have our own base image (compiling erlang, reusing precompiled elixir). Some codes are available at here
We haven’t hit any issues from erlang/elixir on docker.

On k8s:

k8s is not easy - lots of things to learn, CI/CD and local dev ecosystem is still evolving… but it does make sense for certain circumstance

Please keep in mind that you don’t need erlang cluster for scaling. Running independent instances behind load balancer is totally fine.

You WILL NEED erlang cluster when you choose to use any features only available in erlang cluster for scaling. For example, I’ve seen a presentation to add global cache on the top of database to reduce the latency.