Postgrex error (nxdomain) when resolving rootless docker-compose database domain

I’m attempting to setup a rootless docker-compose deployment of an Phoenix app using Podman 4. Unfortunately, when starting the Phoenix App, Ecto throws an error, unable to resolve the database’s domain name (db in the example below)

[error] Postgrex.Protocol (#PID<0.1965.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain

While troubleshooting I’ve overwritten the phoenix app’s container start command with the following:

command: [“eval”, “IO.inspect :gen_tcp.connect(‘db’, 5432, [packet: :raw, mode: :binary, active: false], 3000)”]

Which yields {:ok, #Port<0.6>} , Indicating to me this is some sort of issue with how Postgrex is utilizing gen_tcp.
I also tried using inet_res directly:

command: [“eval”, “IO.inspect :inet_res.resolve(‘db’, :in, :a)”]

Which seems to successfully resolve the IP:

{:ok, 
    {:dns_rec, {:dns_header, 1, true, :query, false, false, true, false, false, 0},
    [{:dns_query, 'db', :a, :in, false}],
    [{:dns_rr, 'db', :a, :in, 0, 86400, {10, 89, 0, 35}, :undefined, [], false}],
 [], []}}

When I try to use other valid domains besides db and its alias’s, including localhost, I get connection errors. That makes sense because there is no Postgres instance listening there, but it seems to pass the name resolution step just fine.
The final piece that I’ve noticed on my local machine is that :inet_res.resolve does not resolve domains specified in the host file (/etc/hosts), but gen_tcp.connect seems to. Though the hosts file is not how domain names are set with docker-compose / podman.

Here’s my Repo config (the commented out parts have been tried but haven’t changed the error)

  config :phoenix_app, PhoenixApp.Repo,
    # ssl: true,
    # socket_options: [:inet6],
    url: "ecto://user:pass@db/phoenix_app",
    pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10")

Have I simply misconfigured something? Is this a bug in Postgrex? What suggestions do you have for debugging this error?

Is it possible your Postgres container is not started when your app tries to connect?

@joey_the_snake I also had this concern when I started debugging this issue and setup a healthcheck for the postgres database and a dependency on that healthcheck to pass before starting the app. From the console logs I can see that this seems to work: It logs that the database is ready before the application starts.

I believe this also would be seen (at least intermittently) when running the application with the eval commands. The eval commands show that, given the correct configuration, Erlang is at least capable of resolving the domain.

Please let me know if you think of anything else to try!

Could you please paste a stacktrace of the logged error?

By the way, since docker released rootless support, I do not use podman, because it is still buggy and podman-compose is still in a buggy alpha state

1 Like

Here are the relevant logs

phoenix_app-db-1      | PostgreSQL init process complete; ready for start up.
phoenix_app-db-1      |
phoenix_app-db-1      | 2022-10-01 16:26:27.180 UTC [1] LOG:  starting PostgreSQL 13.8 on x86_64-pc-linux-musl, compiled by gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219, 64-bit
phoenix_app-db-1      | 2022-10-01 16:26:27.180 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
phoenix_app-db-1      | 2022-10-01 16:26:27.181 UTC [1] LOG:  listening on IPv6 address "::", port 5432
phoenix_app-db-1      | 2022-10-01 16:26:27.185 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
phoenix_app-db-1      | 2022-10-01 16:26:27.193 UTC [46] LOG:  database system was shut down at 2022-10-01 16:26:27 UTC
phoenix_app-db-1      | 2022-10-01 16:26:27.200 UTC [1] LOG:  database system is ready to accept connections
phoenix_app-web-1     | 16:26:34.279 [notice]     :alarm_handler: {:set, {:system_memory_high_watermark, []}}
phoenix_app-web-1     | 16:26:34.283 [error] Postgrex.Protocol (#PID<0.1956.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.283 [error] Postgrex.Protocol (#PID<0.1963.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.284 [error] Postgrex.Protocol (#PID<0.1966.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.284 [error] Postgrex.Protocol (#PID<0.1955.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.284 [error] Postgrex.Protocol (#PID<0.1964.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.284 [error] Postgrex.Protocol (#PID<0.1959.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.284 [error] Postgrex.Protocol (#PID<0.1965.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.284 [error] Postgrex.Protocol (#PID<0.1960.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.284 [error] Postgrex.Protocol (#PID<0.1961.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.284 [error] Postgrex.Protocol (#PID<0.1962.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:34.285 [info] Running PhoenixAppWeb.Endpoint with cowboy 2.9.0 at :::8083 (http)
phoenix_app-web-1     | 16:26:34.287 [info] Access PhoenixAppWeb.Endpoint at http://localhost:8083
phoenix_app-web-1     | 16:26:35.409 [error] Postgrex.Protocol (#PID<0.1956.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:35.469 [error] Postgrex.Protocol (#PID<0.1965.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:35.550 [error] Postgrex.Protocol (#PID<0.1959.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:35.584 [error] Postgrex.Protocol (#PID<0.1961.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:35.686 [error] Postgrex.Protocol (#PID<0.1966.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:35.994 [error] Postgrex.Protocol (#PID<0.1955.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:36.118 [error] Postgrex.Protocol (#PID<0.1960.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:36.337 [error] Postgrex.Protocol (#PID<0.1964.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:36.691 [error] Postgrex.Protocol (#PID<0.1962.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:36.999 [error] Postgrex.Protocol (#PID<0.1963.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:37.016 [error] Postgrex.Protocol (#PID<0.1965.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:37.281 [error] Postgrex.Protocol (#PID<0.1956.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (db:5432): non-existing domain - :nxdomain
phoenix_app-web-1     | 16:26:37.282 [debug] QUERY ERROR source="admins" queue=2994.9ms
phoenix_app-web-1     | SELECT TRUE FROM "admins" AS a0 LIMIT 1 []
phoenix_app-web-1     | 16:26:37.285 [notice] Application phoenix_app exited: PhoenixApp.Application.start(:normal, []) returned an error: shutdown: failed to start child: PhoenixApp.VerifyStartingAdmin
phoenix_app-web-1     |     ** (EXIT) an exception was raised:
phoenix_app-web-1     |         ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2995ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
phoenix_app-web-1     |
phoenix_app-web-1     |   1. Ensuring your database is available and that you can connect to it
phoenix_app-web-1     |   2. Tracking down slow queries and making sure they are running fast enough
phoenix_app-web-1     |   3. Increasing the pool_size (although this increases resource consumption)
phoenix_app-web-1     |   4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
phoenix_app-web-1     |
phoenix_app-web-1     | See DBConnection.start_link/2 for more information
phoenix_app-web-1     |
phoenix_app-web-1     |             (ecto_sql 3.8.3) lib/ecto/adapters/sql.ex:932: Ecto.Adapters.SQL.raise_sql_call_error/1
phoenix_app-web-1     |             (ecto_sql 3.8.3) lib/ecto/adapters/sql.ex:847: Ecto.Adapters.SQL.execute/6
phoenix_app-web-1     |             (ecto 3.8.4) lib/ecto/repo/queryable.ex:221: Ecto.Repo.Queryable.execute/4
phoenix_app-web-1     |             (ecto 3.8.4) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
phoenix_app-web-1     |             (ecto 3.8.4) lib/ecto/repo/queryable.ex:130: Ecto.Repo.Queryable.exists?/3
phoenix_app-web-1     |             (phoenix_app 0.2.0) lib/phoenix_app/verify_starting_admin.ex:15: PhoenixApp.VerifyStartingAdmin.init/1
phoenix_app-web-1     |             (stdlib 4.1) gen_server.erl:851: :gen_server.init_it/2
phoenix_app-web-1     |             (stdlib 4.1) gen_server.erl:814: :gen_server.init_it/6
phoenix_app-web-1     | 16:26:37.299 [notice]     :alarm_handler: {:clear, :system_memory_high_watermark}
phoenix_app-web-1     | [os_mon] memory supervisor port (memsup): Erlang has closed
phoenix_app-web-1     | [os_mon] cpu supervisor port (cpu_sup): Erlang has closed

@hst337 FWIW I’m using docker-compose with podman, not podman-compose. I’ll look into switching to docker, but if utilities like ping and modules like gen_tcp can resolve the domain, wouldn’t that indicate that the issue is some higher-level configuration problem?

Hmm, could you please share relevant parts of your docker-compose file?

Of course. Here is my docker-compose file:

version: "3"
services:
  app:
    image: localhost/phoenix_app_${INSTANCE_NAME}:${phoenix_app_INSTANCE_VERSION}
    # image: docker.io/hexpm/elixir:1.14.0-erlang-25.1-alpine-3.16.2
    ports:
      - "8083:8083"
    depends_on:
      db:
        condition: service_healthy
    environment:
      - HOSTNAME=localhost
      - DATABASE_URL=ecto://user:pass@db/phoenix_app
      - SECRET_KEY_BASE=${SECRET_KEY_BASE}
      - PORT=8083
    networks:
      mynetwork:
        aliases:
          - api.local
    # command: ["eval", "IO.inspect :inet_res.resolve('db', :in, :a)"]
    # command: ["eval", "IO.inspect :gen_tcp.connect('db', 5432, [packet: :raw, mode: :binary, active: false], 3000)"]

  db:
    image: postgres:13-alpine
    environment:
        - POSTGRES_USER=user
        - POSTGRES_PASSWORD=pass
        - POSTGRES_DB=phoenix_app
    networks:
      mynetwork:
        aliases:
          - db.local
    healthcheck:
      test: [ "CMD", "pg_isready", "-q", "-d", "phoenix_app", "-U", "user" ]
      timeout: 45s
      interval: 10s
      retries: 10

  server:
    image: caddy:2-alpine
    ports:
        - ${PORT}:80
        - ${SSL_PORT}:443
    volumes:
        - caddy_data:/data
        - caddy_config:/config
        # - ./Caddyfile:/etc/caddy/Caddyfile
volumes:
    caddy_data:
    caddy_config:

networks:
  mynetwork:
    driver: bridge

And my Containerfile if that is at all interesting:

ARG ELIXIR_VERSION=1.14.0
ARG OTP_VERSION=25.1
ARG ALPINE_VERSION=3.16.2

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-alpine-${ALPINE_VERSION}"
ARG RUNNER_IMAGE="alpine:${ALPINE_VERSION}"

FROM ${BUILDER_IMAGE} as build

ARG MIX_ENV="prod"

# install build dependencies
RUN apk add --no-cache build-base git python3 curl

# prepare build dir
WORKDIR /app

# install hex + rebar
RUN mix local.hex --force && \
    mix local.rebar --force

# set build ENV
ARG MIX_ENV
ENV MIX_ENV="${MIX_ENV}"

ARG FORCE_SSL
ENV FORCE_SSL="${FORCE_SSL}"

# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/$MIX_ENV.exs config/
RUN mix deps.compile

COPY priv priv

# note: if your project uses a tool like https://purgecss.com/,
# which customizes asset compilation based on what it finds in
# your Elixir templates, you will need to move the asset compilation
# step down so that `lib` is available.
COPY assets assets
RUN mix assets.deploy

# compile and build the release
COPY lib lib
RUN mix compile
# changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/
# uncomment COPY if rel/ exists
# COPY rel rel
RUN mix release

# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE} AS app
RUN apk add --no-cache libstdc++ libgcc openssl ncurses-libs ffmpeg

ARG USER
ENV USER="userman"

# Podman bug where MIX_ENV doesn't defined
ARG MIX_ENV="prod"
ENV MIX_ENV="${MIX_ENV}"

WORKDIR "/home/${USER}/app"
# Creates an unprivileged user to be used exclusively to run the Phoenix app
RUN \
  addgroup \
   -g 1000 \
   -S "${USER}" \
  && adduser \
   -s /bin/sh \
   -u 1000 \
   -G "${USER}" \
   -h "/home/${USER}" \
   -D "${USER}" \
  && su "${USER}"

# Everything from this line onwards will run in the context of the unprivileged user.
USER "${USER}"

COPY --from=build --chown="${USER}":"${USER}" /app/_build/"${MIX_ENV}"/rel/phoenix_app ./

ENTRYPOINT ["bin/phoenix_app"]

# Usage:
#  * build: sudo docker image build -t elixir/my_app .
#  * shell: sudo docker container run --rm -it --entrypoint "" -p 127.0.0.1:4000:4000 elixir/my_app sh
#  * run:   sudo docker container run --rm -it -p 127.0.0.1:4000:4000 --name my_app elixir/my_app
#  * exec:  sudo docker container exec -it my_app sh
#  * logs:  sudo docker container logs --follow --tail 100 my_app
CMD ["start"]

I’m really very stumped with this one, so any suggestions are appreciated!

1 Like

@micah Any progress on this one?

I can bet the postgres service was not starting from configuration issues or healthcheck, looks suspicious and unnecessary. I have started countless docker-composes like this one and they all worked flawlessly, just omit the heartbeat part.

I am getting that same domain error is him. I know my postgres is up and running. Obviously my setup is a bit different but am still getting that same error

~sobojack Sorry it has taken me so long to respond! The only solution I found was to switch to docker. I don’t know if that’s of any use to you. Please feel free to attempt to connect with gen_tcp to confirm that postgres is running / available / connected and that this is the same issue.

~D4no0 Did you read the part of my initial question where I was able to connect to Postgres with gen_tcp, but not with Postgrex? This issue for me had something to do with Podman’s networking being different from Docker’s which is revealed by how Postgrex is using gen_tcp. Please believe me when I say I’ve tried with and without the healthcheck.