Are you using a clustered Elixir deployment?

hubertlepicki · July 12, 2020, 8:12am

Well, I think we are mixing two concepts. There are at least two ways to do what we are talking about here.

Use Erlang / Elixir releases with hot code upgrade. I haven’t used this method in production to be fair. But it works when you have set of servers, where you deploy your application to, and the updated application will run on the same servers. By servers I mean OS instance, as in EC2 instance or a real hardware server, that stays the same during deployment. This method doesn’t really care about clustering, which may or may happen in parallel to hot upgrades. It’s just deploying new version of code, to the same servers that were running previous version of code, and there are hooks in GenServers and friends to handle state passing between ‘old version’ and ‘new version’ of code that was just deployed: https://hexdocs.pm/mix/Mix.Tasks.Release.html#module-hot-code-upgrades

Again, these days most of the things I work on are not deployed to such static/dedicated servers, but to a VMs created as needed by some piece of infrastructure, and discarded after the application shut downs. This is the way anything Docker-based or Kubernetes-based works.

Use clustering, no hot code upgrade but cluster starting and shutting down instances.

This is method suitable to pass the in-memory state on deployments when you use something like Kubernetes. When you deploy new version to the cloud, the old instance(s) of application is/are still running on their own containers. During deployment, the piece of infrastructure you use creates new containers for the new release. These start their own little OS instances and run application. Now, here’s the moment where your infrastructure may establish a link between legacy version of application and new version of application, so you can pass state. Again, Gigalixir does that by default, I believe.

We are always using https://github.com/bitwalker/libcluster here to handle cluster formation and this is not really important here.

What is important is that you can listen to events when new nodes join or leave cluster. And example code can be seen here: https://github.com/smartcitiesdata/horde_connector/blob/master/lib/horde_connector.ex#L40

So you monitor the nodes in your cluster in some process, and get events when new node joined/left and you can decide to pass a state to this newly started node by sending some process running on that node message with the state.

There are at least two projects that allow you to abstract most of the details here and do a lot of the legwork for you, one is Swarm another one is Horde. With both you can start processes on the nodes in the cluster, and they will react to cluster formation providing hooks to pass state. With Swarm it’s a bit easier (https://hexdocs.pm/swarm/readme.html#process-handoff) but we observed some undeterministic behavior here, i.e. bugs. In theory it’s super sweet, however, and the API is really nice.

Then, you can do the same with Horde (https://github.com/derekkraan/horde) with a bit more of legwork https://hexdocs.pm/horde/state_handoff.html