I’m having this strange problem with Elixir combined with Docker that I can’t find a similar incidence of elsewhere. I built a web app using Phoenix and dockerized it with docker-compose. It connects to a database and contacts various different services on the public internet. Recently I restarted my server after updating, and now I’m seeing errors like these in the logs:
** (exit) an exception was raised:
** (Protocol.UndefinedError) protocol Enumerable not implemented for {:error, :eaddrnotavail} of type Tuple. This protocol is implemented for the following type(s): Ecto.Adapters.SQL.Stream, Postgrex.Stream, DBConnection.Stream, DBConnection.PrepareStream, Floki.HTMLTree, Function, Range,
, Stream, List, GenEvent.Stream, HashDict, IO.Stream, File.Stream, HashSet
(elixir 1.10.3) lib/enum.ex:1: Enumerable.impl_for!/1
(elixir 1.10.3) lib/enum.ex:141: Enumerable.reduce/3
(elixir 1.10.3) lib/enum.ex:3383: Enum.map/2
(to_booru 0.1.0) lib/to_booru.ex:58: ToBooru.extract_uploads/2
(szurupull 0.1.0) lib/szurupull_web/controllers/upload_controller.ex:29: Szurupull.UploadController.extract/2
(szurupull 0.1.0) lib/szurupull_web/controllers/upload_controller.ex:1: Szurupull.UploadController.action/2
(szurupull 0.1.0) lib/szurupull_web/controllers/upload_controller.ex:1: Szurupull.UploadController.phoenix_controller_pipeline/2
(phoenix 1.5.7) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2
As it turns out the Phoenix process in the container was no longer able to fetch any webpages on the external internet. Instead it gets back the error code :eaddrnotavail
. However, it was still connected to the database container, and I could still visit the webpage of the Phoenix app in my browser correctly, so it looks like the connection between the Docker containers is functioning properly.
My application uses Tesla underneath to handle fetching webpages, so I attached to the Docker container and tried retrieving it using iex
(by running /app/bin/my_app remote
). I got the same error of :eaddrnotavail
. For some reason it always fails after 16 seconds almost exactly.
iex> Tesla.client([Tesla.Middleware.Logger], {Tesla.Adapter.Hackney, [recv_timeout: 30000]}) |> Tesla.get("https://www.youtube.com")
{:error, :eaddrnotavail}
11:49:46.210 [error] GET https://www.youtube.com -> error: :eaddrnotavail (16026.201 ms)
11:49:46.214 [debug]
>>> REQUEST >>>
(no query)
(no headers)
(no body)
<<< RESPONSE ERROR <<<
:eaddrnotavail
If I use httpc
then it gives :econnrefused
as an error code instead. This time it always fails after nearly 8 seconds.
iex> Tesla.client([Tesla.Middleware.Logger], {Tesla.Adapter.Httpc, [recv_timeout: 30000]}) |> Tesla.get("https://www.youtube.com")
11:50:36.891 [info] [73, 110, 118, 97, 108, 105, 100, 32, 111, 112, 116, 105, 111, 110, 32, [123, ['recv_timeout', 44, '30000'], 125], 32, 105, 103, 110, 111, 114, 101, 100, 32, 10]
11:50:44.878 [error] GET https://www.youtube.com -> error: :econnrefused (8002.325 ms)
{:error, :econnrefused}
11:50:44.879 [debug]
>>> REQUEST >>>
(no query)
(no headers)
(no body)
<<< RESPONSE ERROR <<<
:econnrefused
It doesn’t work if I provide the IP address directly, either. The error becomes :econnrefused
.
iex> Tesla.client([Tesla.Middleware.Logger], {Tesla.Adapter.Hackney, [recv_timeout: 30000]}) |> Tesla.get("https://127.217.14.206")
{:error, :econnrefused}
iex(szurupull@e5e30164a29a)2>
12:20:10.467 [error] GET https://127.217.14.206 -> error: :econnrefused (2.721 ms)
12:20:10.469 [debug]
>>> REQUEST >>>
(no query)
(no headers)
(no body)
<<< RESPONSE ERROR <<<
:econnrefused
Of course, this works if I use iex -S mix
from my host machine outside the container:
iex(1)> Tesla.client([Tesla.Middleware.Logger], {Tesla.Adapter.Hackney, [recv_timeout: 30000]}) |> Tesla.get("www.youtube.com")
[warn] GET www.youtube.com -> 301 (48.126 ms)
[debug]
>>> REQUEST >>>
(no query)
(no headers)
(no body)
<<< RESPONSE <<<
content-type: application/binary
x-content-type-options: nosniff
cache-control: no-cache, no-store, max-age=0, must-revalidate
pragma: no-cache
expires: Mon, 01 Jan 1990 00:00:00 GMT
date: Sun, 27 Dec 2020 12:23:20 GMT
location: https://www.youtube.com/
x-frame-options: SAMEORIGIN
server: ESF
content-length: 0
x-xss-protection: 0
{:ok,
%Tesla.Env{
__client__: %Tesla.Client{
adapter: {Tesla.Adapter.Hackney, :call, [[recv_timeout: 30000]]},
fun: nil,
post: [],
pre: [{Tesla.Middleware.Logger, :call, [[]]}]
},
__module__: Tesla,
body: "",
headers: [
{"content-type", "application/binary"},
{"x-content-type-options", "nosniff"},
{"cache-control", "no-cache, no-store, max-age=0, must-revalidate"},
{"pragma", "no-cache"},
{"expires", "Mon, 01 Jan 1990 00:00:00 GMT"},
{"date", "Sun, 27 Dec 2020 12:23:20 GMT"},
{"location", "https://www.youtube.com/"},
{"x-frame-options", "SAMEORIGIN"},
{"server", "ESF"},
{"content-length", "0"},
{"x-xss-protection", "0"}
],
method: :get,
opts: [],
query: [],
status: 301,
url: "www.youtube.com"
}}
But if I connect to the container from the host with docker-compose exec my_container sh
and use curl
, I am still able to retrieve the site normally. This makes me suspect this is an issue with the Elixir side somehow.
/app # curl -v "https://www.youtube.com"
...
> GET / HTTP/2
> Host: www.youtube.com
> User-Agent: curl/7.64.0
> Accept: */*
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [264 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
} [5 bytes data]
< HTTP/2 200
< content-type: text/html; charset=utf-8
< x-content-type-options: nosniff
< cache-control: no-cache, no-store, max-age=0, must-revalidate
< pragma: no-cache
< expires: Mon, 01 Jan 1990 00:00:00 GMT
< date: Sun, 27 Dec 2020 11:47:34 GMT
< x-frame-options: SAMEORIGIN
< strict-transport-security: max-age=31536000
< p3p: CP="This is not a P3P policy! See http://support.google.com/accounts/answer/151657?hl=en for more info."
< server: ESF
< x-xss-protection: 0
< set-cookie: YSC=WWanzXxZmHI; Domain=.youtube.com; Path=/; Secure; HttpOnly; SameSite=none
< set-cookie: VISITOR_INFO1_LIVE=kFXTUXRePU4; Domain=.youtube.com; Expires=Fri, 25-Jun-2021 11:47:34 GMT; Path=/; Secure; HttpOnly; SameSite=none
< alt-svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
< accept-ranges: none
< vary: Accept-Encoding
...
And some sites do not give any errors at all from the Elixir side, like example.com
. But the ones that are important for my service to run all return errors. I’m not sure what the differentiating factor is between the sites that work and the ones that don’t.
iex> Tesla.client([Tesla.Middleware.Logger], {Tesla.Adapter.Hackney, [recv_timeout: 30000]}) |> Tesla.get("https://example.com")
11:54:36.278 [info] GET https://example.com -> 200 (8045.135 ms)
11:54:36.278 [debug]
>>> REQUEST >>>
(no query)
(no headers)
(no body)
<<< RESPONSE <<<
age: 252244
cache-control: max-age=604800
content-type: text/html; charset=UTF-8
date: Sun, 27 Dec 2020 11:54:36 GMT
etag: "3147526947+ident"
expires: Sun, 03 Jan 2021 11:54:36 GMT
last-modified: Thu, 17 Oct 2019 07:18:26 GMT
server: ECS (sec/96EE)
vary: Accept-Encoding
x-cache: HIT
content-length: 1256
Nothing changes even after I stop/remove/rebuild/restart the container with docker-compose, or restart the docker daemon.
I should also mention that I use an internal DNS server that forwards to 8.8.8.8
, but I’m still able to retrieve sites on it using curl
from within the container.
Here are the relevant parts of docker-compose.yml
:
version: '3.3'
services:
szurupull:
build:
context: /home/ruin/build/work/szurupull
ports:
- 4000:4000
networks:
- misaka
depends_on:
- szurupull_db
environment:
- SECRET_KEY_BASE=${SECRET_KEY_BASE}
- DATABASE_HOST=szurupull_db
- DATABASE_URL=ecto://postgres:postgres@szurupull_db/postgres
- VIRTUAL_HOST=<...>
- VIRTUAL_PORT=4000
- LETSENCRYPT_HOST=<...>
- UID=1000
- GID=1000
szurupull_db:
image: postgres:9.6
volumes:
- "/mnt/hibiki/config/szurupull/sql:/var/lib/postgresql/data"
networks:
- misaka
environment:
- POSTGRES_DB=postgres
networks:
misaka:
external: true
I run the app in release mode after compiling it with mix compile
and mix release
. (I followed this guide.) Here is the Dockerfile:
FROM elixir:1.10.3-alpine as build
# install build dependencies
RUN apk add --update git build-base nodejs npm yarn python
RUN mkdir /app
WORKDIR /app
# install Hex + Rebar
RUN mix do local.hex --force, local.rebar --force
# set build ENV
ENV MIX_ENV=prod
# install mix dependencies
COPY mix.exs mix.lock ./
COPY config config
RUN mix deps.get --only $MIX_ENV
RUN mix deps.compile
# build assets
COPY assets assets
RUN cd assets && npm install && npm run deploy
RUN mix phx.digest
# build project
COPY priv priv
COPY lib lib
RUN mix compile
# build release
# at this point we should copy the rel directory but
# we are not using it so we can omit it
# COPY rel rel
RUN mix release
# prepare release image
FROM alpine:3.9 AS app
# install runtime dependencies
RUN apk add --update bash openssl postgresql-client curl
EXPOSE 4000
ENV MIX_ENV=prod
# prepare app directory
RUN mkdir /app
WORKDIR /app
# copy release to app container
COPY --from=build /app/_build/prod/rel/szurupull .
COPY entrypoint.sh .
RUN chown -R nobody: /app
USER nobody
ENV HOME=/app
CMD ["bash", "/app/entrypoint.sh"]
And entrypoint.sh
:
#!/bin/bash
# docker entrypoint script.
# assign a default for the database_user
DB_USER=${DATABASE_USER:-postgres}
# wait until Postgres is ready
while ! pg_isready -q -h $DATABASE_HOST -p 5432 -U $DB_USER
do
echo "$(date) - waiting for database to start"
sleep 2
done
bin="/app/bin/szurupull"
eval "$bin eval \"Szurupull.Release.migrate\""
# start the elixir application
exec "$bin" "start"
I tried adjusting the url:
option that the app listens on in config.exs
to {127, 0, 0, 1}
, but it doesn’t change anything. I also haven’t changed any of the application code since I restarted the server either.
config :szurupull, SzurupullWeb.Endpoint,
url: [host: "localhost"]
Here is the result of running a few debugging commands from within the container.
/app # ulimit -a
-f: file size (blocks) unlimited
-t: cpu time (seconds) unlimited
-d: data seg size (kb) unlimited
-s: stack size (kb) 8192
-c: core file size (blocks) unlimited
-m: resident set size (kb) unlimited
-l: locked memory (kb) 64
-p: processes unlimited
-n: file descriptors 1048576
-v: address space (kb) unlimited
-w: locks unlimited
-e: scheduling priority 0
-r: real-time priority 0
/app # netstat -an | grep -e tcp -e udp | wc -l
27
/app # netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.11:34115 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:40591 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:46545 0.0.0.0:* LISTEN 216/beam.smp
tcp 0 0 0.0.0.0:4369 0.0.0.0:* LISTEN -
tcp 0 0 :::4000 :::* LISTEN -
tcp 0 0 :::4369 :::* LISTEN -
udp 0 0 127.0.0.11:57260 0.0.0.0:* -
Is there some sort of Elixir or Docker configuration I need so the connections can succeed again?