Use Fly.io internal DNS for resolving Database_URL

PJUllrich · April 12, 2021, 8:33am

I am trying to use fly.io to deploy a simple Phoenix + Ecto app, but have problems connecting Ecto to the provisioned Postgres instance.

I created a postgres instance with flyctl postgres create and attached it to the application with flyctl postgres attach --postgres-app my-app-postgres. This adds a DATABASE_URL environment variable to the application context, which I read out in my releases.exs. I checked and my application receives the correct DATABASE_URL. Unfortunately, Postgresx cannot connect to the database and fails with the error:

Postgrex.Protocol (#PID<0.2945.0>) failed to connect: ** (DBConnection.ConnectionError) 
tcp connect (my-app-postgres.internal:5432): non-existing domain - :nxdomain

The problem seems to be Fly’s own DNS resolver, which should resolve the my-app-postgres.internal part in the DATABASE_URL to an IP address. I saw that the LiveView-Counter example project uses a custom DNS Strategy to resolve the APP_NAME.internal URLs. I wondered whether I can set a similar strategy for Ecto to use.

My assumption is that Ecto tries to resolve the my-app-postgres.internal URl with a public DNS instead of the Fly.io internal DNS, which is responsible for the .internal-URLs in the internal network of Fly.

My question is therefore: Do you know how I could configure Phoenix or Ecto to use the internal DNS for the .internal-URLs?

I saw a similar question, where the network_mode in the docker-compose.yml was used to let Phoenix discover other services through the host network, but with Fly.io, one can only use a Dockerfile and not a docker-compose.yml file to create the application. So, I wouldn’t know how to set the network_mode inside the Dockerfile.

Edit: I set the private_network=true flag in my fly.toml file, but it didn’t help:

app = "my-app"

kill_signal = "SIGINT"
kill_timeout = 5

[experimental]
  private_network=true

[[services]]
  internal_port = 4000
  protocol = "tcp"

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20

  [[services.ports]]
    handlers = ["http"]
    port = "80"

  [[services.ports]]
    handlers = ["tls", "http"]
    port = "443"

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    port = "4000"
    restart_limit = 6
    timeout = "2s"

Exadra37 · April 12, 2021, 8:41am

@mrkurt will be the right person to help you here.

mrkurt · April 12, 2021, 1:13pm

Ecto doesn’t speak IPv6 by default in Phoenix. Will you try adding this to your Ecto repo config?

socket_options: [:inet6]

We need better docs for Elixir apps.

PJUllrich · April 12, 2021, 1:52pm

Yes, that did the trick. Thank you very much!

mrkurt · April 12, 2021, 2:55pm

No problem! I just submitted a PR to make this work magically with new Phoenix apps: Enable IPv6 for Ecto by mrkurt · Pull Request #4289 · phoenixframework/phoenix · GitHub

PJUllrich · April 13, 2021, 8:36pm

I threw together a quick blog post about how to get started with Elixir and fly.io.

@mrkurt if you want to use parts or the entire blog post for your docs, you have my permission to do so Thanks again for your help!

mrkurt · April 13, 2021, 8:37pm

Oh wow that’s amazing.

Exadra37 · April 13, 2021, 8:57pm

# These two environment variables will be overwritten when the application is started.
# They are needed here to satisfy the env-variable checks in `prod.secret.exs`
ENV SECRET_KEY_BASE=nokey
ENV DATABASE_URL=nodb

Instead you can delete prod.secret.exs, delete also everything from the prod.exs file but don’t delete it, and move everything inside such files to runtime.exs.

ADD . .

This may cause issues when the target you are building for is using different Phoenix/Elixir/Erlang versions from the ones you have in your host, unless you remove some folders and the lock files:

ADD . .

RUN rm -rf _build deps assets/mode_modules mix.lock package-lock.json

To compile the release prefer instead:

RUN mix deps.get --only prod && \

  npm --prefix ./assets ci --progress=false --no-audit --loglevel=error && \
  npm run --prefix ./assets deploy && \
  mix phx.digest && \

  mix compile && \
  mix release

This is not necessary at all:

EXPOSE 4000

Also, as a best security practice an app should run in its own unprivileged dedicate user in the system, therefore you shouldn’t use this:

USER nobody:nobody

I would recommend instead this Dockerfile:

ARG ELIXIR_VERSION
ARG OTP_VERSION
ARG ALPINE_VERSION

FROM hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-alpine-${ALPINE_VERSION} as build

ARG BUILD_RELEASE_FROM=master

ENV MIX_ENV=prod

WORKDIR /app

RUN \
  apk upgrade --no-cache && \
  apk add \
    --no-cache \
    openssh-client \
    build-base \
    npm \
    git \
    python3 && \

  mix local.hex --force && \
  mix local.rebar --force && \

# @TODO Fix use of secrets in .env. Prefer to use instead docker secrets.
COPY .env /release/.env
COPY ./.git /workspace

RUN \
  git clone --local /workspace . && \
  git checkout "${BUILD_RELEASE_FROM}" && \
  ls -al && \

  mix deps.get --only prod && \

  npm --prefix ./assets ci --progress=false --no-audit --loglevel=error && \
  npm run --prefix ./assets deploy && \
  mix phx.digest && \

  mix compile && \
  mix release

# Start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM alpine:${ALPINE_VERSION} AS app

ENV USER="phoenix"
ENV HOME=/home/"${USER}"
ENV APP_DIR="${HOME}/app"

RUN \
  apk upgrade --no-cache && \
  apk add --no-cache \
    openssl \
    ncurses-libs && \

  # Creates a unprivileged user to run the app
  addgroup \
   -g 1000 \
   -S "${USER}" && \
  adduser \
   -s /bin/sh \
   -u 1000 \
   -G "${USER}" \
   -h "${HOME}" \
   -D "${USER}" && \

  su "${USER}" sh -c "mkdir ${APP_DIR}"

# Everything from this line onwards will run in the context of the unprivileged user.
USER "${USER}"

WORKDIR "${APP_DIR}"

COPY --from=build --chown="${USER}":"${USER}" /app/_build/prod/rel/tasks ./

ENTRYPOINT ["./bin/tasks"]

# Docker Usage:
#  * build: sudo docker build -t phoenix/tasks .
#  * shell: sudo docker run --rm -it --entrypoint "" -p 80:4000 -p 443:4040 phoenix/tasks sh
#  * run:   sudo docker run --rm -it -p 80:4000 -p 443:4040 --env-file .env --name tasks phoenix/tasks
#  * exec:  sudo docker exec -it tasks sh
#  * logs:  sudo docker logs --follow --tail 10 tasks
#
# Extract the production release to your host machine with:
#
# ```
# sudo docker run --rm -it --entrypoint "" --user $(id -u) -v "$PWD/_build:/home/phoenix/_build"  phoenix/tasks sh -c "tar zcf /home/phoenix/_build/app.tar.gz ."
# ls -al _build
# ````
CMD ["start"]

PJUllrich · April 13, 2021, 9:13pm

Ah thanks a lot for the feedback! Just a few quick comments:

Instead you can delete prod.secret.exs

Yes, you’re right about this. I didn’t want to change the generated application more than necessary to keep the blog post short, but I’d probably remove this as well.

unless you remove some folders and the lock files:

I added those folders (and now also the *-lock-files, thanks for that!) to the .dockerignore-file, which has the same effect, right?

To compile the release prefer instead

May I ask what is the advantage of putting all these steps into a single RUN? Is it to prevent that these sub-steps are cached so that if e.g. mix release fails, also the assets are re-compiled instead of cached?

EXPOSE 4000

Thanks, I removed it.

Also, as a best security practice an app should run in its own unprivileged dedicate user in the system

Ah, that’s very good to know! I changed it in my Dockerfile, will evaluate it tomorrow, and change the blog post once I could check that it works. Thanks for that as well!

Exadra37 · April 13, 2021, 9:22pm

If you pay attention the commands I suggested have slight differences and they come directly from the Elixir docs. You can keep them separated, but you may want to adopt the official way of doing it.

For me the Dockerfile is to build a production release, therefore I prefer to not use cache at all, I even use the --no-cache flag on the command line.

In development I use this Dockerfile:

ARG ELIXIR_VERSION
ARG OTP_VERSION
ARG ALPINE_VERSION

FROM hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-alpine-${ALPINE_VERSION} as build

ARG MIX_ENV=dev

ENV MIX_ENV=${MIX_ENV}

ENV USER="developer"
ENV HOME=/home/"${USER}"
ENV APP_DIR="${HOME}/workspace"

RUN \
  apk upgrade --no-cache && \
  apk add \
    --no-cache \
    inotify-tools \
    openssh-client \
    build-base \
    npm \
    git && \

# Creates a unprivileged user to run the app
  addgroup \
   -g 1000 \
   -S "${USER}" && \
  adduser \
   -s /bin/sh \
   -u 1000 \
   -G "${USER}" \
   -h "${HOME}" \
   -D "${USER}" && \

  su "${USER}" sh -c "mkdir ${APP_DIR}"

# Everything from this line onwards will run in the context of the unprivileged user.
USER "${USER}"

RUN \
  mix local.hex --force && \
  mix local.rebar --force

ARG GIT_USER_DEPLOY_TOKEN

RUN \
  # @link https://github.com/elixir-lang/elixir/issues/3422#issuecomment-388188608
  # @link https://gist.github.com/Kovrinic/ea5e7123ab5c97d451804ea222ecd78a
  git config --global url."https://exadra37:${GIT_USER_DEPLOY_TOKEN}@gitlab.com".insteadOf git://gitlab.com

WORKDIR "${APP_DIR}"

CMD ["sh"]