Connecting Livebook to a node on a remote docker container

I have an Elixir app running on a remote server using Dokku. I’d like to connect to it from my local machine using Livebooks, via a SSH tunnel.

I can’t seem to make the remote container visible to my local Livebook.

I’ve tried the top StackOverflow comments and have tried setting RELEASE_NODE, RELEASE_NAME, and RELEASE_DISTRIBUTION. I have RELEASE_COOKIE configured and am using this.

I have tried using RELEASE_NODE as <app_name>@localhost, <app_name>@127.0.0.1, and <app_name>@<server_ip>.

I gather there’s some shenanigans trying to get EPMD to see the remote node. I’ve tried setting ERL_DIST_PORT to 9000, forwarding Dokku’s host 9000 to the container’s 9000, and opening an SSH tunnel with 9000 open.

Is there a straightforward-ish solution or problem I’m missing? I appreciate there’s a few moving parts.

1 Like

Hey @k-p! There are a few elements

  1. The remote node needs to start distribution on a known port. I believe ERL_DIST_PORT is used by rebar3/relx, not the Elixir releases. To force a specific port you can do ELIXIR_ERL_OPTIONS="-erl_epmd_port 9000" (or pass it in vm.args).

  2. The distribution port needs to be forwarded to a local port, let’s assume it’s the same.

  3. The remote node needs to have 127.0.0.1 hostname (or if it uses a domain name, you can edit your local /etc/hosts to resolve it to 127.0.0.1).

  4. With the above steps, the node appears as if it was running locally. However, the missing piece is that the local EPMD doesn’t know about this node. You can register the node manually by running this script:

    # epmd_register_node.exs
    
    defmodule EPMD do
      def epmd_register_node(node_name, port) do
        # Registers the given node under the given port in EPMD.
        #
        # We open a TCP connection to EPMD and send the registration
        # request. We keep the socket open. The node is automatically
        # unregistered when the calling process terminates.
        #
        # See the EPMD protocol [1] and the reference implementation [2].
        #
        # [1]: https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html#register-a-node-in-epmd
        # [2]: https://github.com/erlang/otp/blob/OTP-27.0/lib/kernel/src/erl_epmd.erl#L403-L433
    
        epmd_host = {127, 0, 0, 1}
        epmd_port = 4369
    
        case :gen_tcp.connect(epmd_host, epmd_port, [:binary, packet: :raw, active: false]) do
          {:ok, socket} ->
            request =
              <<
                # ALIVE2_REQ
                120::8,
                # Node distribution port
                port::16,
                # Node type (normal)
                77::8,
                # Protocol (TCP/IPv4)
                0::8,
                # Highest and lowest version of the distributino protocol,
                # see https://github.com/erlang/otp/blob/OTP-27.0/lib/kernel/include/dist.hrl#L88-L89
                6::16,
                6::16,
                # Node name
                byte_size(node_name)::16,
                node_name::binary,
                # Extra
                0::16
              >>
    
            data = <<byte_size(request)::16, request::binary>>
            :ok = :gen_tcp.send(socket, data)
    
            case :gen_tcp.recv(socket, 0) do
              {:ok, data} ->
                result =
                  case data do
                    # ALIVE2_X_RESP
                    <<118, result::8, _creation::32>> -> result
                    # ALIVE2_RESP
                    <<121, result::8, _creation::16>> -> result
                  end
    
                if result == 0 do
                  :ok
                else
                  :gen_tcp.close(socket)
                  {:error, "failed to register node in EPMD, result code: #{result}"}
                end
    
              {:error, reason} ->
                :gen_tcp.close(socket)
                {:error, "failed to receive response from EPMD, reason: #{inspect(reason)}"}
            end
    
          {:error, reason} ->
            {:error, "failed to connect to EPMD, reason: #{inspect(reason)}"}
        end
      end
    end
    
    
    EPMD.epmd_register_node("mynodename", 9000) |> IO.inspect()
    
    Process.sleep(:infinity)
    

    Make sure to change the node name at the end, it should be the base name, without the hostname part. The registration is kept until you kill the script.

Technically steps 3. and 4. could be done by using a custom EPMD module, but since we are talking about Livebook, the solution assumes no changes to the EPMD module. (In fact, Livebook already uses a custom EPMD for a similar purpose).

2 Likes

Hello! Thanks for your help and your script to register the node.

I make it as far as seeing connection refused in the SSH tunnel when I attempt to connect to the node so I can see Livebook is trying to connect over the port to the remote node. But I think it’s still stuck somewhere between 1-3.

  1. Done, and I can see that this port has been picked up if I cat /proc/net/tcp after entering the container. If I try to run bin/app remote for an IEX session I receive err address in use.
  2. Done
  3. This might be where I’m unstuck. I have set the RELEASE_NODE as app@127.0.0.1.

I’m quite a novice with distributed Elixir, apologies. If I enter the container, and run epmd -names I can see that epmd is still running on port 4369 and the node (name app at port 9000) is running on port 9000, with the config set in step one. Is that right? The argument to me reads as though epmd should be on port 9000.

1 Like

Just to make sure, you are using RELEASE_DISTRIBUTION=name, RELEASE_NODE=app@127.0.0.1 and the same cookie?

If I enter the container, and run epmd -names I can see that epmd is still running on port 4369 and the node (name app at port 9000) is running on port 9000, with the config set in step one. Is that right? The argument to me reads as though epmd should be on port 9000.

This is correct. The argument name is confusing, and in fact it’s going to change in the next OTP version. It is supposed to set the port that the node uses for distribution, not the EPMD port itself. So name app at port 9000 looks good.

If the connection is refused, the only thing I can think of is something with the forwarding. For the ssh forwarding maybe try 127.0.0.1 and not localhost, unless you already do that. Otherwise it could be the port forwarding to Docker.

2 Likes

Hey, I’m trying to do the same thing without the ssh tunnel part.

So far I can connect a local iex session to the remote node (with Node.connect) but only using it’s IP, using it’s hostname won’t work for some reason (while a local iex session with --remsh app@example.com works).

And I can’t get working livebook locally with my remote node, I started it with the LIVEBOOK_NODE, LIVEBOOK_DISTRIBUTION and ERL_AFLAGS but no luck livebook won’t connect to the remote node.

Any idea what the problem might be? I found another topic which look like my issue but my mtu is already set at 1500 so no luck either here.

Ok found it, starting livebook without the ERL_AFLAGS and now it connects!

Here is my configuration for anyone looking for the same thing:

Release

Create rel/remote.vm.args.eex with:

-erl_epmd_port 9000

So that docker compose exec app bin/app remote works

Remote

services:
  app:
    ...
    environment:
      ELIXIR_ERL_OPTIONS: "-erl_epmd_port 9000"                                  
      RELEASE_DISTRIBUTION: name
      RELEASE_NODE: app@example.com    
      RELEASE_COOKIE: cookie
    port:
      - 4369:4369
      - 9000:9000

Local

docker run -p 8080:8080 -p 8081:8081 --pull always -e LIVEBOOK_NODE="livebook@127.0.0.1" -e LIVEBOOK_DISTRIBUTION=name -e LIVEBOOK_DEFAULT_RUNTIME="attached:app@example.com:cookie" ghcr.io/livebook-dev/livebook
2 Likes

I’m trying to better understand how to connect a local BEAM node to one running on a remote server in a container, for debugging and inspecting the system and such (not necessarily with livebook), and I still don’t really get it

  • The remote node needs to start distribution on a known port. I believe ERL_DIST_PORT is used by rebar3/relx, not the Elixir releases. To force a specific port you can do ELIXIR_ERL_OPTIONS="-erl_epmd_port 9000" (or pass it in vm.args).

I don’t get why this why this works, I thought erl_epmd_port sets the port that epmd listens on (epmd — erts v15.2.2) not the distribution port of the node itself? What am I missing?

@_jonas you linked to ERL_EPMD_PORT env var, which indeed sets the port for epmd to listen on. However, the -erl_epmd_port flag sets the node distribution port. The fact that they are named the same is unfortunate, however in recent OTP the flag has been deprecated in favour of kernel application parameter named erl_epmd_node_listen_port, which is more accurate.

3 Likes

Ohhh that makes sense, thanks for explaining!
Good thing it’s being changed to that less confusingly named flag, I would have never guessed :sweat_smile:

So just to summarize what I’ve gathered for myself then:

  • On a remote BEAM instance, we can manually set the distribution port with erl_epmd_node_listen_port
  • We can probably even start that instance without its epmd since it’s essentially bypassed, I saw there is a flag for this as well erl — erts v15.2.2
  • Now we can forward that distribution port to our local host and connect to the remote node from a local one through it
    • The suggestion to facilitate that in this thread is to register the node with the local epmd manually
    • I’ve also read somewhere that it’s possible to connect directly to another node via its port, fully bypassing epmd, if both the local and remote node are running without epmd and have their distribution port set manually, though I can’t really find good docs on this

If you set -erl_epmd_port (or the new erl_epmd_node_listen_port) for the local node, then it’s going to assume this port is used by all nodes it connects to.

The issue is that if you are using a proxy, let’s say 127.0.0.1:4444 -> remote.node:5555, then you would need to set -erl_epmd_port 4444, but the local node cannot start distribution at that port, since it’s already taken by the proxy.

2 Likes

Ah, that is good to know!

I thought when running without epmd, there would be some way to manually specify the distribution port of another node to connect to, similar to how you would normally specify the name.
I guess I was mistaken – then I suppose your suggested way of running with a local epmd and registering the remote node manually is indeed the best option, thanks again for explaining.