(DBConnection.ConnectionError) tcp recv: closed (the connection was closed by the pool, possibly due to a timeout or because the pool has been terminated)

Stack trace
DBConnection.ConnectionError: (DBConnection.ConnectionError) tcp recv: closed (the connection was closed by the pool, possibly due to a timeout or because the pool has been terminated)
(ecto_sql) lib/ecto/adapters/sql.ex:612: Ecto.Adapters.SQL.raise_sql_call_error/1
(ecto_sql) lib/ecto/adapters/sql.ex:545: Ecto.Adapters.SQL.execute/5
(ecto) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
(ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
(diamond) lib/diamond/concepts/experiment/update_count_for_each_running_experiment.ex:21: Diamond.Concepts.Experiment.UpdateCountForEachRunningExperiment.fetch_count/1
(diamond) lib/diamond/concepts/experiment/update_count_for_each_running_experiment.ex:10: Diamond.Concepts.Experiment.UpdateCountForEachRunningExperiment.call/0
(quantum) lib/quantum/executor.ex:141: anonymous fn/2 in Quantum.Executor.run/4
(elixir) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
(quantum) anonymous fn() in Quantum.Executor.run/4

config.exs

config :diamond, Diamond.Scheduler,
  jobs: [
    {"*/60 * * * *", {Diamond.Concepts.Experiment.UpdateCountForEachRunningExperiment, :call, []}}
  ]

Also sometimes I get

15:00:15.060 [error] Postgrex.Protocol (#PID<0.2309.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.28048.4> timed out because it queued and checked out the connection for longer than 15000ms

#PID<0.28048.4> was at location:

    :prim_inet.recv0/3
    (postgrex) lib/postgrex/protocol.ex:2837: Postgrex.Protocol.msg_recv/4
    (postgrex) lib/postgrex/protocol.ex:1879: Postgrex.Protocol.recv_bind/3
    (postgrex) lib/postgrex/protocol.ex:1771: Postgrex.Protocol.bind_execute/4
    (db_connection) lib/db_connection/holder.ex:316: DBConnection.Holder.holder_apply/4
    (db_connection) lib/db_connection.ex:1255: DBConnection.run_execute/5
    (db_connection) lib/db_connection.ex:1342: DBConnection.run/6
    (db_connection) lib/db_connection.ex:539: DBConnection.parsed_prepare_execute/5

I have been getting this error in production sicen ages and I have increase the time in cron still it didn’t help.
Can anyone please help me here.

Thanks

This error usually happens when a query takes too long to execute on the database side. If you post the code of UpdateCountForEachRunningExperiment.fetch_count/1 and relevant schemas, folks will be able to help spot obvious performance traps.

1 Like

Here is code for UpdateCountForEachRunningExperiment.fetch_count/1

def fetch_version_wise_count(experiment_ids) do
    from(esl in Diamond.Models.ExperimentSplitLog,
      where: esl.experiment_id in ^experiment_ids,
      group_by: [esl.experiment_id, esl.rule_set_version_id],
      select: [count(), esl.rule_set_version_id, esl.experiment_id]
    )
    |> Repo.all()
  end

Some questions in-line:

from(esl in Diamond.Models.ExperimentSplitLog,
  # How many rows in ^ this table?
  where: esl.experiment_id in ^experiment_ids,
  # Is there an index on ^ this column?
  group_by: [esl.experiment_id, esl.rule_set_version_id],
  #                      what about ^ this column?
  select: [count(), esl.rule_set_version_id, esl.experiment_id]. 
  # how many rows do you expect in the result?

This will help determine why the query is timing out.

How many rows in experiment_split_log table?: 279517334

Is there an index on ^ this column?: foreign key constraint

how many rows do you expect in the result?: It depends on the condition

Also, after above query is executed below query is executed

def update_cached_count(count, experiment_id, version_id) do
    from(evm in @model,
      where: evm.experiment_id == ^experiment_id and evm.rule_set_version_id == ^version_id
    )
    |> Repo.update_all(set: [cached_count: count])
  end

Thanks.

A foreign key constraint requires an index / primary key on the referenced column (id in the experiments table) but does NOT create or require an index on the referring column (experiment_id).

Sequential-scanning 279 million rows without an index? You’re Gonna Have A Bad Time.

1 Like

We are using postgres database and select index query gave me this output.

Can you help me if there is an proper index or not.

Thanks.

There is not. You’ll want an index covering experiment_split_log.experiment_id, otherwise the only way Postgres can find rows satisfying this query is by checking each of the 279 million…

from(esl in Diamond.Models.ExperimentSplitLog,
      where: esl.experiment_id in ^experiment_ids,
1 Like

Thanks for the help.