Connect nodes using Libcluster with docker-compose

Zios · November 22, 2019, 4:29pm

I’m trying to use libcluster in order to connect elixir nodes. You can see here my simplified version of docker-compose.

 version: "3.7"
 services:
   application_1:
     build:
       context: application_1/.
     ports:
       - "4321:4000"
   application_2:
     build:
       context: application_2/.
     ports:
       - "1234:4000"

Everything is starting perfecly, each services is working fine. When I’m connected to application_1, I’m able to ping the application_2 from the bash terminal.

But when I’m trying to connect the nodes (using libcluster or simply using Node.connect(:application_1@application_1) ), nothing seems to work. I also try to set the same cookie for both application, Node.list is still empty.

Anyone have an idea on how I could simply connect those two nodes manually and even better using libcluster ?

Thanks

alvises · November 22, 2019, 5:46pm

So, let’s start with the manual connection in docker. Once it works on your side we can move to libcluster

We run two iex in two different containers, connected to the same elixir network and passing the same ERLANG_COOKIE via env variable

First let’s create the bridge network (something that docker-compose creates automatically)

docker network create elixir

Then run the first container in a terminal, attaching it to the elixir network. The container’s name is app1 which is its domain name inside the network.

The command we run in the container is iex --sname app@app1 --cookie ${ERLANG_COOKIE}

$ docker container run -it  --rm --network elixir  \
  -e ERLANG_COOKIE="its_a_secret" \
  --name app1 \
   elixir:1.9 \
  bash -c 'iex --sname app@app1 --cookie ${ERLANG_COOKIE}'

then we run the second app2 container in another terminal

$ docker container run -it  --rm --network elixir  \
  -e ERLANG_COOKIE="its_a_secret" \
  --name app2 \
   elixir:1.9 \
  bash -c 'iex --sname app@app2 --cookie ${ERLANG_COOKIE}'

and we connect it to app@app1

iex(app@app2)1> Node.connect(:app@app1)
true
iex(app@app2)1> Node.list
[:app@app1]

So we can test connection with docker-compose using a simple script app.exs I just made for this case. We make available in the two containers via bind mounting

docker-compose.yaml

version: "3.7"
services:
  app1:
    image: "elixir:1.9"
    environment:
      ERLANG_COOKIE: "its_a_secret"
    volumes:
      - ./app.exs:/app.exs

    command: ["bash", "-c", "elixir --no-halt --sname app@app1 --cookie $$ERLANG_COOKIE app.exs"]
    # command: ["bash", "-c", "sleep infinity"]

  app2:
    image: "elixir:1.9"
    environment:
      ERLANG_COOKIE: "its_a_secret"
    volumes:
      - ./app.exs:/app.exs
    command: ["bash", "-c", "elixir --no-halt --sname app@app2 --cookie $$ERLANG_COOKIE app.exs"]

$ docker-compose up
app2_1  | I'm here! Sleeping for 2 seconds
app1_1  | I'm here! Sleeping for 2 seconds
app2_1  | [self is :app@app2]: :app@app1
app2_1  | connect (from :app@app2: true
app1_1  | [self is :app@app1]: :app@app2
app1_1  | connect (from :app@app1: true
app2_1  | nodes: [:app@app1]
app2_1  | ping :app@app1: :pong
app1_1  | nodes: [:app@app2]
app1_1  | ping :app@app2: :pong
app2_1  | ping :app@app1: :pong
...

@Zios does it work on your side?

Zios · November 25, 2019, 12:15pm

First of all, thanks for the time you took in your response @alvises, that’s awesome !

So, this weekend I tried to progress based on your answer. First of all, everything you said work for me. Based on that I tried to implement this stuff in my code.

The code I use is a little more complex because I’m using Distillery and Phoenix.

I changed rel/config.exs in order to use the the same cookie in my 2 applications :

environment :prod do
  set(include_erts: true)
  set(include_src: false)
  set(cookie: :"super_cookie")
  set(vm_args: "rel/vm.args")
  set(post_start_hooks: "rel/post_start_hooks")
end

And in vm.args:

-sname <%= release_name %>@application_1

I tried to use -name in the same way than Distillery’s documentation but it failed.

And it work ! So I will try to use Libcluster now

alvises · November 28, 2019, 12:02am

Today I had time to play with elixir releases, libcluster and docker-compose, trying to make them work together.

Have you tried the Cluster.Strategy.Gossip strategy? It works well in a docker bridge network and it finds the nodes automatically, without defining a fixed node list. This is great if you need to scale up or down without having to pass a fixed list of nodes every time.

Using the Elixir releases there is an RELEASE_NODE environment to set the sname option (is there maybe something similar with Distillery?), so my docker-compose.yaml file looks like this (webapp:3 is the image with the release of the sample app)

version: "3.7"

services:

  app1:
    image: "webapp:3"
    environment:
      RELEASE_NODE: app@app1

  app2:
    image: "webapp:3"
    environment:
      RELEASE_NODE: app@app2

and the lib/application.ex file is pretty simple

defmodule Webapp.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  def start(_type, _args) do
    topologies = [
      default: [
        strategy: Cluster.Strategy.Gossip
      ]
    ]

    children = [
      {Cluster.Supervisor, [topologies, [name: Webapp.ClusterSupervisor]]},
      {Task, fn -> ping_nodes() end}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Webapp.Supervisor]
    Supervisor.start_link(children, opts)
  end

  defp ping_nodes() do
    Process.sleep(1_000)
    Node.list()
    |> Enum.each(fn node ->
      IO.puts("[#{inspect(Node.self())} -> #{inspect(node)}] #{inspect(Node.ping(node))}")
    end)
    ping_nodes()
  end
end

Each node also start a Task to ping other nodes. And running docker-compose up I get this log

docker-compose up
Creating network "webapp_default" with the default driver
Creating webapp_app2_1 ... done
Creating webapp_app1_1 ... done
Attaching to webapp_app1_1, webapp_app2_1
app1_1  |
app1_1  | 23:32:25.284 [info]  [libcluster:default] connected to :app@app2
app1_1  | [:app@app1 -> :app@app2] :pong
app2_1  | [:app@app2 -> :app@app1] :pong
app1_1  | [:app@app1 -> :app@app2] :pong
app2_1  | [:app@app2 -> :app@app1] :pong
...

So, it works and it’s really easy to setup - but the problem with docker compose is that, as far as I know, is not possible to have just one app service and dynamically set a different RELEASE_NODE env variable for each replica. So we are forced to create a different service for each node…

I wrote an article few months ago about this, maybe it can be useful (especially if you are interested to deploy it with Kubernetes): Connecting Elixir Nodes with libcluster, locally and on Kubernetes. It shows how to connect phoenix chat nodes in kubernetes, but with mix, no releases. With Kubernetes is possible to dynamically set the env variables using the container’s IP, which is pretty cool

Florin · January 11, 2020, 5:06pm

Hi there,

I would like to achieve the same, but app1 and app2 would be started individually on two different VMs having their hostnames: foo and bar, for brevity. foo and bar are in the same vpc (network). How would the docker compose or the RELEASE_NODE be configured, so that I could use libcluster to create a cluster between app1 and app2?

Thank you!

Florin · January 11, 2020, 6:13pm

Found the answer by myself, but replying here for brevity. The network_mode: host setting does the trick. After this, it is just setting properly the release and configuring libcluster with the appropriate strategy; I am using a custom one, in my case.

Thank you for the info offered by the original discussion, it helped!

polypush135 · April 30, 2021, 7:40pm

How would you do this if it was not using sname but rather name?

IE:

nodes = MapSet.new([:"a@0.0.0.0", :"b@0.0.0.0"])
other_node =
	nodes
	|> MapSet.delete(Node.self()) 
	|> MapSet.to_list() 
	|> List.first()
	|> IO.inspect(label: "[self is #{inspect(Node.self)}]")

Node.connect(other_node) |> IO.inspect(label: "connect (from #{inspect(Node.self)}")

Process.sleep 2_000

Node.list() |> IO.inspect(label: "nodes")

Enum.each 1..5, fn _ ->

	Node.ping(other_node) 
	|> IO.inspect(label: "ping #{inspect(other_node)}")
	Process.sleep(1_000)
end

and docker

version: "3.7"
services:
  app1:
    image: "elixir:1.9"
    environment:
      ERLANG_COOKIE: "its_a_secret"
    volumes:
      - ./app.exs:/app.exs

    command: ["bash", "-c", "elixir --no-halt --name a@0.0.0.0 --cookie $$ERLANG_COOKIE app.exs"]

  app2:
    image: "elixir:1.9"
    environment:
      ERLANG_COOKIE: "its_a_secret"
    volumes:
      - ./app.exs:/app.exs
    command: ["bash", "-c", "elixir --no-halt --name b@0.0.0.0 --cookie $$ERLANG_COOKIE app.exs"]

Produces

dual_node_docker_compose % docker compose up
[+] Running 2/0
 ⠿ Container dual_node_docker_compose_app1_1  Created                                                                                                                        0.0s
 ⠿ Container dual_node_docker_compose_app2_1  Created                                                                                                                        0.0s
Attaching to app1_1, app2_1
app2_1  | [self is :"b@0.0.0.0"]: :"a@0.0.0.0"
app2_1  | connect (from :"b@0.0.0.0": false
app1_1  | [self is :"a@0.0.0.0"]: :"b@0.0.0.0"
app1_1  | connect (from :"a@0.0.0.0": false
app2_1  | nodes: []
app2_1  | ping :"a@0.0.0.0": :pang
app1_1  | nodes: []
app1_1  | ping :"b@0.0.0.0": :pang
app2_1  | ping :"a@0.0.0.0": :pang
app1_1  | ping :"b@0.0.0.0": :pang
^CGracefully stopping... (press Ctrl+C again to force)

I’ve tried 127.0.0.1 as well.

Also when using sname this works fine so my assumption is its related to the host ip and docker

EDIT: Yeah it was an issue with the host name.
using docker’s hostname and providing a fully qualified host for my node worked.

I also went ahead a made an example LiveView + EPMDless Cluster example docker app.

Hope it helps any of you. GitHub - joshchernoff/cluster_example: Just trying to start more than one phoenix app.

maz · October 22, 2022, 4:26am

If you were to deploy this in production, would you need to deploy a@0.0.0.0 on, say, a digitalocean vm and b@0.0.0.0 on another digitalocean vm in order to be able to take advantage of the extra cpu horsepower that clustering affords?

Seeing as they are in the same docker-compose file I don’t see how that is possible. So the only other conclusion I can draw is that if one were to deploy clustered nodes with libcluster like this they’d have to do it on a single digital ocean instance.

But would this actually result in a tangible speedup if they were on the same vm?

maz · October 27, 2022, 1:36am

if this helps anyone, I used this docker-compose.yml to deploy two clustered elixir nodes behind a traefik router. YMMV

version: "3.7"
services:
  app1:
    build: .
    hostname: "foo.dev"
    environment:
      - RELEASE_NODE=app@foo.dev
      - RELEASE_DISTRIBUTION=name
      - COOKIE=foobar
      - ERLANG_COOKIE=foobar
    env_file:
      - .envrc
    depends_on:
      - karkov_db
    networks:
      - proxy-network
    labels:
      - "traefik.enable=true"
      - "traefik.port=4001"
      - "traefik.http.routers.ws-foo.rule=Host(`ws-foo.myapp.com`)"
      - "traefik.http.middlewares.ws-foo-https-redirect.redirectscheme.scheme=https"
      - "traefik.http.routers.ws-foo.middlewares=ws-foo-https-redirect"
      - "traefik.http.routers.ws-foo-secure.rule=Host(`ws-foo.myapp.com`)"
      - "traefik.http.routers.ws-foo-secure.tls=true"
      - "traefik.http.routers.ws-foo-secure.tls.certresolver=http"
      - "traefik.http.routers.ws-foo-secure.service=ws-foo"
      - "traefik.http.services.ws-foo.loadbalancer.server.port=4001"
      - "traefik.docker.network=proxy-network"
    ports:
      - "127.0.0.1:4001:4001"
  app2:
    build: .
    hostname: "bar.dev"
    environment:
      - RELEASE_NODE=app@bar.dev
      - RELEASE_DISTRIBUTION=name
      - COOKIE=foobar
      - ERLANG_COOKIE=foobar
    env_file:
      - .envrc
    depends_on:
      - karkov_db
    networks:
      - proxy-network
    labels:
      - "traefik.enable=true"
      - "traefik.port=4002"
      - "traefik.http.routers.ws-bar.rule=Host(`ws-bar.myapp.com`)"
      - "traefik.http.middlewares.ws-bar-https-redirect.redirectscheme.scheme=https"
      - "traefik.http.routers.ws-bar.middlewares=ws-bar-https-redirect"
      - "traefik.http.routers.ws-bar-secure.rule=Host(`ws-bar.myapp.com`)"
      - "traefik.http.routers.ws-bar-secure.tls=true"
      - "traefik.http.routers.ws-bar-secure.tls.certresolver=http"
      - "traefik.http.routers.ws-bar-secure.service=ws-bar"
      - "traefik.http.services.ws-bar.loadbalancer.server.port=4001"
      - "traefik.docker.network=proxy-network"
    ports:
      - "127.0.0.1:4002:4001"
  karkov_db:
    image: postgis/postgis:14-master
    command: postgres -c shared_preload_libraries=pg_stat_statements -c pg_stat_statements.max=10000 -c pg_stat_statements.track=all
    environment:
      POSTGRES_DB: "karkov"
      POSTGRES_USER: "postgres"
      POSTGRES_PASSWORD: "postgres"
      POSTGRES_HOST_AUTH_METHOD: "md5"
    volumes:
      - database-storage:/var/lib/postgresql/data
    networks:
      - proxy-network
    ports:
      - "127.0.0.1:5432:5432"
volumes:
  database-storage:
    driver: local
networks:
  proxy-network:
    external: true

maz · October 27, 2022, 6:05pm

I have two separate docker-compose.yml files, one for two clustered frontend nodes and docker-compose.yml for a “server” node that I wish to ADD as a node to the libcluster nodes.

I fire-up the server node first(via docker-compose up), where in application.ex I put:

Node.start(:"server@127.0.0.1")
Node.set_cookie(:foobar)

(I do not configure libcluster in the server app… perhaps this is the issue?)

then I start-up the frontend nodes via docker-compose.

The frontend libcluster config in runtime.exs:

config :libcluster,
  topologies: [
    example: [
      strategy: Cluster.Strategy.Epmd,
      config: [hosts: [:"app@foo.dev", :"app@bar.dev", :"server@127.0.0.1"]],
      connect: {:net_kernel, :connect_node, []},
      disconnect: {:erlang, :disconnect_node, []},
      list_nodes: {:erlang, :nodes, [:connected]},
    ]
  ]

But the message I get in the frontend nodes on docker-compose up is:

app2_1 | 17:37:03.675 [warning] [libcluster:example] unable to connect to :"server@127.0.0.1"
app1_1 | 17:37:03.673 [warning] [libcluster:example] unable to connect to :"server@127.0.0.1"

both docker-compose have:

networks:
  proxy-network:
    external: true

(So I’m assuming that libcluster should be able to find the other nodes via the docker network)

When I try:

network_mode: "host"
ports: [4000:4000]

with the “server” node, I get:

"host" network_mode is incompatible with port_bindings

So I am thinking that this means that I don’t need “host” network_mode because I have a docker network(?)

Still not sure why the frontend nodes cannot see the “server” node. Of course this is based on the assumption that they should see each other over libcluster. (EDIT: As I’m writing this, I realize that do not configure libcluster in the server app… perhaps this is the issue?) Any ideas?