Not able to access Elixir shell using `bin/app_name remote`

nittin · January 27, 2025, 5:23am

I deployed my Phoenix application using Docker and I’m not able to access the elixir shell of the application. Below is how it looks like:

$ bin/app_name remote
Could not connect to "app_name"

Is there any reason why this happens?

hubertlepicki · January 27, 2025, 10:29am

There’s so many ways to deploy a Phoenix application using Docker that you will have to give us more information about it.

Generally the error says there’s no connection. This means that the remote shell does not know how or cannot reach the pod that is running the actual application. This is going to be a networking and discovery configuration so that the container in which you start remote console has to know where to find the container running Phoenix server…

If you are doing that via Kubernetes/GKE please let me know. I have a blog post 80% finished describing how to do precisely the above but I lacked motivation to wrap it up

nittin · January 27, 2025, 12:29pm

@hubertlepicki I deployed the app with the standard Dockerfile generated by running mix phx.gen.release --docker using Kamal. Here’s the entire Dockerfile:

# Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian
# instead of Alpine to avoid DNS resolution issues in production.
#
# https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
# https://hub.docker.com/_/ubuntu?tab=tags
#
# This file is based on these images:
#
#   - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
#   - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20240904-slim - for the release image
#   - https://pkgs.org/ - resource for finding needed packages
#   - Ex: hexpm/elixir:1.17.2-erlang-27.0.1-debian-bullseye-20240904-slim
#
ARG ELIXIR_VERSION=1.17.2
ARG OTP_VERSION=27.0.1
ARG DEBIAN_VERSION=bullseye-20240904-slim

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

FROM ${BUILDER_IMAGE} as builder

# install build dependencies
RUN apt-get update -y && apt-get install -y build-essential git \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

# prepare build dir
WORKDIR /app

# install hex + rebar
RUN mix local.hex --force && \
  mix local.rebar --force

# set build ENV
ENV MIX_ENV="prod"
ENV ERL_FLAGS="+JPperf true"

# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile

COPY priv priv

COPY lib lib

COPY assets assets

# compile assets
RUN mix assets.deploy

# Compile the release
RUN mix compile

# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/

COPY rel rel
RUN mix release

# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE}

RUN apt-get update -y && \
  apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

WORKDIR "/app"
RUN chown nobody /app

# set runner ENV
ENV MIX_ENV="prod"

# Only copy the final release from the build stage
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/app_name ./

USER nobody

# If using an environment that doesn't automatically reap zombie processes, it is
# advised to add an init process such as tini via `apt-get install`
# above and adding an entrypoint. See https://github.com/krallin/tini for details
# ENTRYPOINT ["/tini", "--"]

EXPOSE 4000

CMD ["sh", "-c", "bin/app_name eval AppName.Release.migrate && bin/app_name start"]

I’m not so sure how it’s not working. The app is deployed and I’m able to access the container using docked exec -it container-id bin/bash on my server. I’m actually trying within the container to access the Elixir shell using the command bin/app_name

hubertlepicki · January 27, 2025, 12:29pm

oh. I don’t know how Kamal does this stuff

D4no0 · January 27, 2025, 12:34pm

I cannot say for sure, but one of the problems might be how your node is named. Here you can take a look at release config that works: Blame · rel/env.sh.eex · main · SSL MOON / SSL MOON · GitLab

jekae · February 12, 2025, 6:50am

I still didn’t find the fix but the issue seems to be occuring because Kamal sets the hostname flag on the docker container to you server IP followed by a hash.

I believe this causes the started iex shell to be unable to connect to the server since they are on two different network or something like this.

I will try to debug some more tomorrow and see if I can find a solution

jekae · February 20, 2025, 2:42pm

Alright, that was quite the ride but I think I finally figured it out.

TLDR

adding this to rel/env.sh.eex fixed it for me

#!/bin/bash

export RELEASE_DISTRIBUTION=name

Ok, but why?

Note that I am still new to the Elixir/BEAM ecosystem so I might be wrong on some things. If something doesn’t make sense, please correct me.

The bin/<app-name> remote command connects to your running Phoenix app using a remote shell. As far as I understand, this starts a new BEAM node and connects to your existing Phoenix node using the magic of BEAM. This is the same thing that you would do to connect two BEAM nodes together in a distributed cluster. The only difference is that this is all happening on the same host.

The way the BEAM discovers how to connect to nodes is with epmd (Erlang Port Mapper Daemon). When a BEAM node starts it registers itself with epmd using it’s name. By default, this is the name of your Phoenix app when using Phoenix releases. If you then want to connect remotely to the node, you can use one of two formats to discover it, name or sname.

sname is used by default. When using sname, you can just pass the name of the app to epmd and it will try to look for that node on you local machine. The format for node names is name@host but when using sname, the @host portion is implicit since it is always the localhost.

name on the other hand is used when you also want to connect to a BEAM node on local and remote machine. It takes the format of name@host where @host is the domain name or ip address of the other machine that holds the node you want to connect to.

Now what I believe happens is that when you use sname, it seems to default to using the top level domain of whatever hostname is and uses that as the implicit @host part. In most cases, this is fine. But if the hostname is set to an ip address, epmd will take what it thinks is the top level domain and take the first 1-3 digits of the ip address. So for example, if you have a hostname of 123.456.789, epmd will take 123 as the “top level domain” and use that as the implicit @host. (ex: app@123)

The problem this causes is that when epmd tries to resolve the 123 host, it leads to no where. Resolving number-only domains results in weird behaviours most of the time. And so epmd just can’t find the nodes on localhost since it tries to look for node on a host that doesn’t exist.

Now, Kamal sets the --hostname flag on the docker container to the IP of the server/role that you specified in deploy.yml followed by what seems to be a random hash, ex: 123.456.789-hgjkagh. This causes the problem explained above and makes epmd unable to find your localhost nodes.

So now here is the solution, when RELEASE_DISTRIBUTION is set to name, the default @host (In Phoenix release at least) will be the full hostname and not just the top level domain part. This works because when you pass the --hostname flag to Docker, it will add an entry to /etc/hosts pointing that hostname to the IP of the docker container on the network. This enables epmd to resolve that hostname to you current docker container, and is then able to find the node to connect to.

Hope this helps, if you know more about this topic and believe that I made a wrong assumption, please let me know, I would be very interested to learn.

D4no0 · February 20, 2025, 4:03pm

Isn’t that exactly the same thing to what I pointed to?

jekae · February 20, 2025, 4:29pm

Not really. This works when the --hostname flag doesn’t start with numbers followed by a .. This works when you don’t set a hostname, or when the hostname contains numbers and letters.

This would work: --hostname sslmoon;
This would also work --hostname 123sslmoon;
But this would’t work: --hostname 123.sslmoon;
And this also would’t work: --hostname 123.456.789-sslmoon (which is essentially what Kamal gives us).

You can try it by starting the docker container locally and specifying the hostname

D4no0 · February 20, 2025, 4:35pm

Ah NVM, I see it now. Glad you figured it out!