Using :dns_cluster with docker-compose locally (it can be done)

Warmest greetings, comrades.

I recently started using :dns_cluster (GitHub - phoenixframework/dns_cluster: Simple DNS clustering for distributed Elixir nodes) in a couple of production projects which are deployed on Fly. These all suffer from various “distributed” problems (not because of dns_cluster, just because of normal distributed problems). I have been trying to emulate the setup locally with minimal setup for new devs. I did not want to use a different clustering lib in dev (e.g. libcluster) because I think these kind of differences between prod and dev make debugging almost impossible.

So I have a working project using docker-compose where I have 2 instances of the same Phoenix application with static IP addresses on a custom docker network and with node long-names of form app_name@ip_address. This mimics the standard long-names seen on fly.io machines.

I thought about writing a blog post about it but time is very precious these days. I am already overloaded. So I am posting this here, hopefully with adequate tags and keywords to be found by someone searching.

Broad strokes what to do:

2 services in docker-compose.override.yaml

name: myapp
services:
  myapp1: &myapp_mapping
    ports:
      - "4000:4000"
    env_file: ../docker/myapp/dev.env
    environment:
      IP_V4_ADDRESS: 192.0.1.11
    volumes:
      - ../assets:/app/assets
      - ../config:/app/config:ro
      - ../lib:/app/lib:ro
      - ../priv:/app/priv
      - ../seeds:/app/seeds:ro
      - ../test:/app/test:ro
      - ../mix.exs:/app/mix.exs:ro
      - ../mix.lock:/app/mix.lock:ro
      - ../.iex.exs:/app/.iex.exs:ro
    networks:
      erlcluster:
        ipv4_address: 192.0.1.11

  myapp2:
    <<: *myapp_mapping
    ports:
      - "4001:4000"
    environment:
      IP_V4_ADDRESS: 192.0.1.12
    networks:
      erlcluster:
        ipv4_address: 192.0.1.12

networks:
  erlcluster:
    ipam:
      driver: default
      config:
        - subnet: "192.0.1.0/24"

Note a custom network is created using IP range reserved for documentation and testing purposes. The volumes is a trick to get hot-reloading.

Use the ipv4_address in your docker entrypoint, e.g.

iex --name myapp@$1 --cookie $2 -S mix phx.server

the cookie is in the dev.env file.

Note both nodes have same shortname (“myapp”) but different hostnames (the ipv4 address).

So that is it in broad strokes.

The next trick is to write a custom DNS resolver module for DNSCluster, like this:

defmodule MyApp.DevDNSClusterResolver do
  @moduledoc false

  require Record

  Record.defrecord(:hostent, Record.extract(:hostent, from_lib: "kernel/include/inet.hrl"))

  def basename(node_name) when is_atom(node_name) do
    [basename, _] = String.split(to_string(node_name), "@")
    basename
  end

  def connect_node(node_name) when is_atom(node_name), do: Node.connect(node_name)

  def list_nodes, do: Node.list(:visible)

  def lookup(query, type) when is_binary(query) and type in [:a, :aaaa] do
    query
    |> String.split()
    |> Enum.reduce([], fn query, acc ->
      case :inet_res.getbyname(~c"#{query}", type) do
        {:ok, hostent(h_addr_list: addr_list)} -> addr_list ++ acc
        {:error, _} -> acc
      end
    end)
  end
end

This is almost identical to the normal DNSCluster.Resolver module except for the lookup function which does String.split. This hack allows us to specify multiple hosts to check for IP addresses. Note we differ from fly.io deployment here because we don’t have an overlay network (they have, e.g. myapp.internal or something like that). That’s why we need multiple calls to :inet_res.getbyname.

This is actually a good thing because we can choose which server will serve our requests since they map to different ports. Server 1 is at localhost:4000, server 2 is at localhost:4001.

Finally, you need to have this env var in, e.g dev.env:

DNS_CLUSTER_QUERY="myapp1 myapp2"

As mentioned, this will be split by our custom resolver and both IP v4 addresses are obtained.

So that’s how I got it to work. I am testing using :pogo for global singletons, regional singletons, etc. and it is nice to have 2 clustered apps by default.
It is especially comforting to my mind that they are clustering using the same lib as prod. Like I said, I can choose which server to manually test on (ports 4000 vs 4001) and I can attach to whatever running server I want.

This is not a neat write-up but there is enough info for anyone else who wants this kind of local setup for dns_cluster clustering with docker-compose. I am happy to provide more info.

11 Likes

This is pretty amazing, thank you.

@slouchpie
thanks a lot ! It really helped a lot.

I did some experiment using libcluster, it works seamlessly as well, and you don’t even need a resolver script.

At first it didn’t work and it took me while to figure out the problem.
I initially tried to play with dns and dns search attribute in docker-compose but actually just using hostname was enough so that both nodes can communicate.

Cheers

Wondering, what would be the reasons to use dns_cluster over libcluster?

Not sure to understand your question… is that incompatible to use both ?

From my further digging into it (there’s a part in a talk by Chris McCord on dns_cluster, btw), dns_cluster is a simpler dependency, but both solve the same problem, so, I don’t see why use both. The libcluster lib appears more extensible. E.g., someone wrote Postgres node discovery strategy for it, which is pretty cool.

1 Like

I think dns_cluster is a response to default EPMD shipped with OTP, as it’s not completely trivial to set it up and running correctly, so it makes it fairly trivial to setup some basic distribution for your phoenix app, especially if you use a service like fly.io that has full support for this out of the box.

I would say that if you need anything more than the basic use-case for connecting nodes, you should use libcluster always, as that library not only has more node discovery strategies, but also can handle things like automatic re-connection.

That doesn’t really make sense. With clustering you have two things to handle: Figuring out which nodes to connect to and figuring out a port to use to establish the connection to that node.

libcluster and dns_cluster come with various answers to the former – OTP comes with either a way to hardcode predefined nodenames or allowing what those libraries do, where you connect to whichever nodes you want dynamically at runtime.

epmd is one way to do the latter, where when connecting to a host the well known port of epmd is asked for a separate port of the host to establish the connection to the node. The other option usable in newer versions of OTP is hardcoding a single port to use per node (no epmd needed), which simplifies the connection process, but means you can only run a single erlang node per host (per cluster).

So by default you still use epmd with libcluster as well as dns_cluster.

Thanks for the details. Talking about the first task (node discovery), logically thinking, I do not seem to need either libcluster or dns_cluster when my topology is static, right? That is, if I, at deployment, know the network address of each node forming a cluster, I don’t need to “discover” them?

Agree, I have no idea what I’m talking about :smiley: .

I based my assumption on some previous issues that were discussed around epmd not respecting how dns resolution on a system works, so my first idea was that dns_cluster fixed that.

Can you clarify: this does not apply to the latest, empd-less OTPs, right? It’s that I see that clustering seems to work fine without any empd daemon running on the host.

No you don’t. Libcluster has a strategy for a list of static nodes matching what OTP does iirc.

epmd does port mapping. It has nothing to do with hostname resolution. It just deal with multiple erlang nodes on a single host and mapping them to ports as well as informing other nodes what the selected ports are.

It does. OTP added the ability to disable epmd in favor of a hardcoded port, but that’s not a default. You need to opt into that.

1 Like

I looked into all built-in strategies, and it seems that the one you refer to is actually named Cluster.Strategy.Epmd, also being the simplest one.

I used :dns_cluster in this “guide” for only 1 reason: because it is in the Fly “clustering” tutorial

re: :libcluster vs :dns_cluster,
I think

  • libcluster is complicated and featureful (multiple, configurable strategies and topologies)
  • dns_cluster is simple with no features (it’s just a single small file).

It is easier to transition from :dns_cluster to :libcluster than vice versa.