Postgrex connection timeout error in Phoenix app

michelson · August 17, 2025, 1:21am

Hi everyone,

I’m running into a database connection issue in my Phoenix app, and the error reported in Sentry doesn’t give me any useful stacktrace beyond this:

Postgrex.Protocol (#PID<0.4692.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.1503867.0> timed out because it queued and checked out the connection for longer than 30000ms

#PID<0.1503867.0> was at location:

    :prim_inet.recv0/3
    (postgrex 0.21.1) lib/postgrex/protocol.ex:3261: Postgrex.Protocol.msg_recv/4
    (postgrex 0.21.1) lib/postgrex/protocol.ex:2263: Postgrex.Protocol.recv_bind/3
    (postgrex 0.21.1) lib/postgrex/protocol.ex:2155: Postgrex.Protocol.bind_execute/4
    (db_connection 2.8.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.8.0) lib/db_connection.ex:1555: DBConnection.run_execute/5
    (db_connection 2.8.0) lib/db_connection.ex:826: DBConnection.execute/4
    (ecto_sql 3.12.1) lib/ecto/adapters/postgres/connection.ex:139: Ecto.Adapters.Postgres.Connection.execute/4

Since this is all I see in Sentry, I don’t know how to trace back which query or part of the code triggered the issue.

Has anyone dealt with this? How can I get more context or a clearer stacktrace when this kind of timeout happens?

martosaur · August 17, 2025, 5:14am

Are there any other errors? This one likely caused a crash in some other place which should have a different stacktrace and can lead to problematic query.

michelson · August 17, 2025, 2:09pm

There are indeed other errors, but I’m not sure if they are related. However, there are not the same number of errors related to this one.

suspect could be:

* `lib/ecto/adapters/sql.ex` in `Ecto.Adapters.SQL.raise_sql_call_error/1` at line `1096`

* `lib/ecto/adapters/sql.ex` in `Ecto.Adapters.SQL.execute/6` at line `994`

* `lib/ecto/repo/queryable.ex` in `Ecto.Repo.Queryable.execute/4` at line `232`

* `lib/oban/engines/basic.ex` in `Oban.Engines.Basic.prune_jobs/3` at line `175`

* `lib/oban/engine.ex` in `anonymous fn/4 in Oban.Engine.prune_jobs/3` at line `272`

* `lib/oban/engine.ex` in `anonymous fn/3 in Oban.Engine.with_span/4` at line `387`

* `/app/deps/telemetry/src/telemetry.erl` in `:telemetry.span/3` at line `324`

* `lib/oban/plugins/pruner.ex` in `anonymous fn/1 in Oban.Plugins.Pruner.check_leadership_and_delete_jobs/1` at line `150`

camatcode · August 17, 2025, 2:30pm

I would check your pool_size in your config - are you using the default 10? Is that appropriate for what your app is doing? .

michelson · August 17, 2025, 4:05pm

we have a pool_size = 30 for the oban config, we have actually increased it, but we still have the same result


  config :app, App.Repo,
    socket_options: maybe_ipv6,
    url: database_url,
    pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10"),
    queue_target: String.to_integer(System.get_env("QUEUE_TARGET") || "1000"),
    queue_interval: String.to_integer(System.get_env("QUEUE_TIMEOUT") || "10000"),
    timeout: String.to_integer(System.get_env("TIMEOUT") || "20000")

  config :app, App.RepoOban,
    socket_options: maybe_ipv6,
    url: database_url,
    pool_size: String.to_integer(System.get_env("OBAN_POOL_SIZE") || "30"),
    queue_target: String.to_integer(System.get_env("OBAN_QUEUE_TARGET") || "1000"),
    queue_interval: String.to_integer(System.get_env("OBAN_QUEUE_TIMEOUT") || "15000"),
    timeout: String.to_integer(System.get_env("OBAN_TIMEOUT") || "30000")

martosaur · August 17, 2025, 4:19pm

Perhaps oban pruner took more than 30s to prune jobs? If you export Oban metrics somewhere you can check how long prune_jobs usually take and if there were any spikes and/or exceptions.

sorentwo · August 20, 2025, 8:28pm

The issue is entirely because of slow queries causing connection timeouts. For the pruner timeout, that may be from a very large backlog of jobs (if you just added pruning recently, or it’s not able to keep up). The pruner only deletes 10k jobs at a time every 1m though, it’s minimal load on the database.

You can use telemetry to measure the time prune_jobs takes (Oban.Telemetry — Oban v2.20.1) to measure it. However, a query timeout there indicates that something else is probably going on:

Is the database busy elsewhere? Is it under-provisioned for RAM/CPU for your workload?
Do you have a high number of cancelled/discarded` jobs? If so, the v13 migration in Oban v2.20 would help with pruning queries.