Docker Swarm + Phoenix Channels

Hi friends!

My team are going to use Docker Swarm on production, and I made a simple chat application to test Channels communication/broadcasting through containers.

The problem is: despite my test app ran well using a single container, my websocket communication doesn’t see to work when I use Docker Swarm replication. I send some messages using a browser, and they don’t show up on another browser window (currently I’m using VM’s to test it).

Does anyone have any experience on using Phoenix Channels through a load balancer?

Thanks in advance!

2 Likes

The most likely reason you are not seeing messages is that the different BEAMs in the different containers are not set up in a single cluster, but are running each on their own. So: How are you setting up the Elixir (BEAM, really) cluster?

They need to know where the cluster of nodes to connect to are. I don’t know how Docker Swarm works in detail, more familiar with Kubernetes and that stack, and here are some pointers to how it is don with that tooling:

https://substance.brpx.com/clustering-elixir-nodes-on-kubernetes-e85d0c26b0cf
https://github.com/bitwalker/libcluster

I assume it will be similar with Swarm, with the usual “differences in all the details”, but the tl;dr is that each node needs to know where the rest of the cluster is to connect the nodes together. Once that is achieved, you’ll see messages across the cluster.

4 Likes

Thanks! I’m currently using something with epmd, and still studying some way to achieve dynamic clustering with Swarm. :slight_smile:

Try Peerage. Though it’s designed to work with Kubernetes, it also works with Docker Swarm.

You need to understand many things before you can successfully deploy a replicated, stateful Phoenix web app.

Here’s my approach:

1. Include peerage into the dependencies

# mix.exs
def deps do
  [
    {:peerage, "~> 1.0.2", only: :prod}
  ]
end

2. Configure peerage

# config/prod.exs
config :peerage, via: Peerage.Via.Dns,
  dns_name: "tasks.myservice",
  app_name: "myapp"

Peerage consults the DNS server specified in dns_name for IP addresses of all the replicas in the service. In a swarm, Docker ships with a DNS server for every service, running on the domain tasks.<service name>. If you attach to a running replica and run nslookup tasks.<service name>, it will list the IP addresses of all replicas. Then peerage prepends app_name to each of the IP addresses and tries to connect to them via Node.connect, and peerage does this lookup then connect procedure periodically.

For example, if your phoenix app has 3 replicas, and at some time they happen to run on 10.0.1.2, 10.0.1.3 and 10.0.1.4, peerage will get these IP’s by calling :inet_res.lookup('tasks.myservice', :in, :a) under the hood, and tries to connect to :"myapp@10.0.1.2", :"myapp@10.0.1.3" and :"myapp@10.0.1.4". So actually the dns_name should be “tasks.<docker service name>”, and the app_name should be the part before @ of the environment variable $RELEASE_NODE (see below). Since peerage uses deferred_config, maybe app_name: {:system, "RELEASE_NAME"} will work, too, I haven’t tried yet.

3. Make the replicas know their own node names

# rel/env.sh.eex
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE="<%= @release.name %>@$(hostname -i)"

Don’t know how to do this in batch script, sorry.
hostname -i will give you the IP address of the current host.

After that, just build and deploy as usual. You can verify the result by running ./bin/myapp rpc "IO.inspect Node.list" in any container of the replicas.

1 Like

Now I actually have a question, since peerage connects all the replicas after the Phoenix app starts, what will happen if I create a global GenServer and start it at the same time when my Repo starts?

Besides, if that global GenServer successfully starts, what will happen when a network split happens?

So this is interesting topic you are bringing up.

If you are using Phoenix channels in a cluster, it is likely using the default PG2 back-end. This back-end is using the pg2 library from Erlang, that is part of standard library.

The library is pretty minimal yet smart in what it does: your local processes can join a distributed process group, and then based on that Phoenix PubSub sends messages directly to these joined processes.

pg2 monitors the nodes that joined given group and also handles netsplits and recovery / topology changes accordingly, so your nodes can drop/reconnect/join the cluster and Phoenix channels should work as expected. As in: the nodes that do drop out or become unresponsive might miss some messages, but otherwise things will heal on it’s own.

More about this in this thread:

and also in this overview of pg2 failure mechanisms:
http://christophermeiklejohn.com/erlang/2013/06/03/erlang-pg2-failure-semantics.html

Now, the situation is slightly different if you want to start your own GenServers and make them visible in the cluster. The main question I would have is: do you want them to be unique? And if so: unique per node or unique in cluster?

It’s easier to have GenServers unique in a node, and then you can actually use the same pg2 to have some sort of clustering/balancing/message passing between these and other processes in the cluster. Go for it if you can.

If you have to make the processes unique in the cluster you are in a bit of worse position. Unfortunately, that very often is the case that you need to do so.

I don’t know about the library peerage but I suspect it’s similar to libcluster that I use, and just provides a mechanisms to connect/reconnect/change topology of the cluster of Erlang nodes and nothing more.

You need something more if you want to have global cluster processes registration, for unique processes. :global is one obvious choice, but if you are worrying about netsplit & recovery, then this is not going to work. Because it just locks up / stops working on netsplit and there’s no way to recover as far as I undestand either. I tend to rule :global out for any cloud based deployment for that reason. It’s great if you have two nodes hooked together with a gigabit ethernet cable, but anything in the cloud should expect and will experience networking failures between the nodes.

I can see two tools you can use in that situation: GitHub - bitwalker/swarm: Easy clustering, registration, and distribution of worker processes for Erlang/Elixir and GitHub - derekkraan/horde: Horde is a distributed Supervisor and Registry backed by DeltaCrdt.

Both pretty similar, Swarm more of it’s own API, while Horde trying to be compatible with Elixir’s Registry. I’ve used Swarm a lot, just a tiny bit of Horde so far so can’t tell you much about the later experience in production. Swarm (with libcluster) works great.

With both tools you can register GenServers (and other processes) as unique in the cluster, you can handle netsplit/topology changes where cluster breaks in half, configure minimum cluster quorum and also handle recovery when two nodes which briefly disconnected re-connect. The last feature is pretty cool, because you can have situations where two nodes are briefly running independently and started their own “global” GenServers and when they re-connect you want to ensure there is only one left running - and merge the state of both. With both Horde and Swarm you can do it.

I think Swarm may rely on libcluster, so not sure if it’ll work well with the clustering library you use. Horde probably too.

1 Like

Many thanks, @hubertlepicki, that’s a very informative answer. I’ll try these solutions this weekend. Thanks again.

Yet another question: we know that Node.connect/1 accepts an atom that is the node name to connect to. In an orchestrated environment, container IP addresses tend to change during reboots, so there will be indefinitely many atoms created in the service discovery procedures, and atoms are not garbage collected. Is this a potential memory leak? Or just because reboots are relatively rare cases, this memory leak can be neglected?

The problem would be only visible if you are in a hot deployment environment and do many many many deployments, right? Otherwise, on a cold stop deployment, the atom tables get nuked and you start fresh when new container starts.