(DBConnection.ConnectionError) tcp recv: closed . Server doesnt recover

Preen · November 22, 2017, 12:42pm

We have a large query that gives us a timeout from the PostrgreSQL. This results in (DBConnection.ConnectionError) tcp recv: closed.
However after that the app doesnt recover and I have to force restart the app in order for it to be reachable again.

  exit: ** (exit) %DBConnection.ConnectionError{message: "tcp recv: closed"}
  File "lib/ecto/adapters/postgres/connection.ex", line 86, in Ecto.Adapters.Postgres.Connection.prepare_execute/5
  File "lib/ecto/adapters/sql.ex", line 243, in Ecto.Adapters.SQL.sql_call/6
  File "lib/ecto/adapters/sql.ex", line 431, in Ecto.Adapters.SQL.execute_and_cache/7
  File "lib/ecto/repo/queryable.ex", line 130, in Ecto.Repo.Queryable.execute/5
  File "lib/ecto/repo/queryable.ex", line 35, in Ecto.Repo.Queryable.all/4
  File "lib/ecto/repo/preloader.ex", line 171, in Ecto.Repo.Preloader.fetch_query/8
  File "lib/ecto/repo/preloader.ex", line 111, in Ecto.Repo.Preloader.preload_assoc/10
  File "lib/task/supervised.ex", line 85, in Task.Supervised.do_apply/2
  Module "erlang", in :erlang.apply/2

We are using the following setup:
{:phoenix, “~> 1.2.4”},
{:phoenix_ecto, “~> 3.2”},
…

On the server
Erlang/OTP 19 [erts-8.3.5] [source] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]
psql (PostgreSQL) 9.6.5

Is this the correct behaviour? : killing the server completely?

benwilson512 · November 22, 2017, 1:21pm

It depends a bit on when and where you’re making the query. If it’s inside the init clause of a process then yes, this is expected. Can you elaborate a bit more about the circumstances that accompany this query?

Preen · November 22, 2017, 4:51pm

We are using exAdmin and have around 20 records that have relationships with another table of 650 000 records. When trying to go to that index it loads for 15-20 seconds then 502 bad gateway and the app dies. I know the query def could be improved. However Im more concerned about the application not getting up on its feets after its been kicked down.

benwilson512 · November 22, 2017, 5:22pm

Does your database become completely unresponsive? Do you run out of ram on either the server or DB instance? Query timeouts don’t normally cause application failures for me. Do you get just one of the errors you listed or a bunch of them?

Preen · November 22, 2017, 5:58pm

So once I run it, it goes up to about max. 1.91G. Then it did jump down… Se attached.

Preen · November 22, 2017, 5:59pm

After that it crashes and it goes down to 22xM.

benwilson512 · November 22, 2017, 6:01pm

After that it crashes and it goes down to 22xM.

Sorry what’s the it? The BEAM or postgres?

Preen · November 22, 2017, 6:07pm

Sorry. The servers used RAM I meant when sayin “IT”.

Here’s the log from POSTGRESQL

2017-11-22 18:03:52.053 UTC [26009] nike_raffle_user@nike_raffle_production LOG:  could not send data to client: Connection reset by peer
2017-11-22 18:03:52.053 UTC [26009] nike_raffle_user@nike_raffle_production STATEMENT:  SELECT p0."id", p0."retailer_id", p0."shoesize_id", p0."shoe_id", p0."first_name", p0."last_name", p0."email", p0."phone", p0."birthdate", p0."picked_up", p0."has_won", p0."is_bot", p0."notified", p0."inserted_at", p0."updated_at", p0."retailer_id" FROM "participants" AS p0 WHERE (p0."retailer_id" = ANY($1)) ORDER BY p0."retailer_id"
2017-11-22 18:03:52.056 UTC [26009] nike_raffle_user@nike_raffle_production FATAL:  connection to client lost
2017-11-22 18:03:52.056 UTC [26009] nike_raffle_user@nike_raffle_production STATEMENT:  SELECT p0."id", p0."retailer_id", p0."shoesize_id", p0."shoe_id", p0."first_name", p0."last_name", p0."email", p0."phone", p0."birthdate", p0."picked_up", p0."has_won", p0."is_bot", p0."notified", p0."inserted_at", p0."updated_at", p0."retailer_id" FROM "participants" AS p0 WHERE (p0."retailer_id" = ANY($1)) ORDER BY p0."retailer_id"

Preen · November 22, 2017, 6:12pm

Looks like its the application itself that takes a huge amount of RAM. At around 70% of servers total RAM usage it crashes.

benwilson512 · November 22, 2017, 6:24pm

Basically, I’m wondering if your OS is just killing it because it’s using up all the RAM.

Preen · November 24, 2017, 11:49am

Yeah maybe it is. So its rather a OS problem than the application it self you mean?

benwilson512 · November 24, 2017, 1:47pm

Well, the OS is doing what it’s supposed to be doing. If a program is taking over all the RAM, it’s gonna close that program.

In a production environment you always want to have some form of supervision over the application itself. This can be something simple like systemd or init.d on the server, or something more complex like Kubernetes or ECS. In both cases it’s the job of those systems to restart the whole application should it go down for one reason or another.

In short, everything is doing the job it’s supposed to be doing. The problem here is a design problem. Loading 650k DB records into RAM is not really what you want to do. But the program tries its best, the OS stops it, and then there’s nothing in place to restart it.

Preen · November 27, 2017, 11:54am

I agree, however I do have a Systemd configured, but maybe not the correct way?

[Unit]
Description=Phoenix Application for ...
After=network.target

[Service]
Type=forking
User=root
Group=root
WorkingDirectory=/var/www/...
ExecStart=/var/www/... start
ExecStop=/var/www/... stop
Restart=on-failure
RestartSec=5
SyslogIdentifier=...
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

amnu3387 · November 27, 2017, 1:18pm

And does this work if you stop the application ? Is it restarted automatically?
I think RemainAfterExit is not something you want to turn on if I understand it correctly.

One thing I came to realize after struggling with making systemd work correctly when re/starting apps was that environment variables have to be set on the systemd file (or somehow loaded into it) otherwise they won’t be used when restarting the app.

Also, if the application itself doesn’t fully crash, just errors out, systemd won’t do anything and that has to be dealt inside the app