Recently, because of a hosting provider’s hardware issue, the production DB was shut down and then the app was moved to a new DB server. The log filled with lines like the following:
State: %Oban.Peers.Postgres.State{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Basic, get_dynamic_repo: nil, log: false, name: Oban, node: "codex", notifier: Oban.Notifiers.Postgres, peer: Oban.Peers.Postgres, plugins: [{Oban.Plugins.Reindexer, []}, {Oban.Plugins.Pruner, [max_age: 300]}], prefix: "public", queues: [default: [limit: 10], imports: [limit: 10], mailer: [limit: 10]], repo: SupplierLink.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, name: {:via, Registry, {Oban.Registry, {Oban, Oban.Peer}}}, timer: nil, interval: 30000, leader?: false, leader_boost: 2}
[error] GenServer {Oban.Registry, {Oban, Oban.Stager}} terminating
** (stop) exited in: GenServer.call(#PID<0.824.0>, :leader?, 5000)
** (EXIT) an exception was raised:
** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 1997ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
1. Ensuring your database is available and that you can connect to it
2. Tracking down slow queries and making sure they are running fast enough
3. Increasing the pool_size (although this increases resource consumption)
4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
See DBConnection.start_link/2 for more information
(db_connection 2.5.0) lib/db_connection.ex:972: DBConnection.transaction/3
(oban 2.16.3) lib/oban/peers/postgres.ex:94: anonymous fn/2 in Oban.Peers.Postgres.handle_info/2
(telemetry 1.2.1) /Users/cjw/dev/supplier_link/deps/telemetry/src/telemetry.erl:321: :telemetry.span/3
(oban 2.16.3) lib/oban/peers/postgres.ex:92: Oban.Peers.Postgres.handle_info/2
(stdlib 5.0.2) gen_server.erl:1067: :gen_server.try_handle_continue/3
(stdlib 5.0.2) gen_server.erl:977: :gen_server.loop/7
(stdlib 5.0.2) proc_lib.erl:241: :proc_lib.init_p_do_apply/3
(elixir 1.15.7) lib/gen_server.ex:1074: GenServer.call/3
(oban 2.16.3) lib/oban/peer.ex:99: Oban.Peer.leader?/2
(oban 2.16.3) lib/oban/stager.ex:101: Oban.Stager.check_leadership_and_stage/1
(oban 2.16.3) lib/oban/stager.ex:75: anonymous fn/2 in Oban.Stager.handle_info/2
(telemetry 1.2.1) /Users/cjw/dev/supplier_link/deps/telemetry/src/telemetry.erl:321: :telemetry.span/3
(oban 2.16.3) lib/oban/stager.ex:74: Oban.Stager.handle_info/2
(stdlib 5.0.2) gen_server.erl:1077: :gen_server.try_handle_info/3
(stdlib 5.0.2) gen_server.erl:1165: :gen_server.handle_msg/6
(stdlib 5.0.2) proc_lib.erl:241: :proc_lib.init_p_do_apply/3
Last message: :stage
And these log messages repeat every few seconds seemingly without limit. Is there a way to configure Phoenix/Oban/Postgrex to take some sort of backoff strategy about connecting to the database? And would it be possible to detect that the connection has returned and reconnect gracefully? Thanks!