Cannot acquire DBConnection after 1000 tests run during CircleCI tests while in :shared mode

We have our elixir project setup to run a circleci job where it invokes mix test. We are seeing recently and in increasing frequency this type of an error:

     19:57:51.320 [error] Task #PID<0.3093.0> started from #PID<0.3098.0> terminating
     ** (DBConnection.ConnectionError) could not checkout the connection owned by #PID<0.3072.0>. When using the sandbox, connections are shared, so this may imply another process is using a connection. Reason: connection not available and request was dropped from queue after 129ms. You can configure how long requests wait in the queue using :queue_target and :queue_interval. See DBConnection.start_link/2 for more information
         (ecto_sql 3.4.4) lib/ecto/adapters/sql.ex:593: Ecto.Adapters.SQL.raise_sql_call_error/1
         (ecto_sql 3.4.4) lib/ecto/adapters/sql.ex:526: Ecto.Adapters.SQL.execute/5
         (ecto 3.4.4) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
         (ecto 3.4.4) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
         (ecto 3.4.4) lib/ecto/repo/preloader.ex:203: Ecto.Repo.Preloader.fetch_query/8
         (ecto 3.4.4) lib/ecto/repo/preloader.ex:120: Ecto.Repo.Preloader.preload_assoc/10
         (elixir 1.10.3) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
         (elixir 1.10.3) lib/task/supervised.ex:35: Task.Supervised.reply/5
         (stdlib 3.12.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
     Function: &:erlang.apply/2
         Args: [#Function<8.71330160/1 in Ecto.Repo.Preloader.maybe_pmap/4>, [{{:assoc, %Ecto.Association.Has{cardinality: :many, defaults: [], field: :action_shootouts, on_cast: nil, on_delete: :nothing, on_replace: :delete, ordered: false, owner: TheScoreData.Schemas.HockeyBoxScore, owner_key: :id, queryable: TheScoreData.Schemas.HockeyActionShootout, related: TheScoreData.Schemas.HockeyActionShootout, related_key: :box_score_id, relationship: :child, unique: true, where: []}, {0, :box_score_id}}, nil, nil, []}]] error_logger=%{tag: :error_msg} report_cb=&Task.Supervised.format_report/1

Here is our setup code in our tests configuring the mode:

  # test/test_helper.exs
  ExUnit.start(capture_log: true)
  Ecto.Adapters.SQL.Sandbox.mode(Op.Repo, :manual)

  # test/support/data_case.ex
  setup tags do
    :ok = Ecto.Adapters.SQL.Sandbox.checkout(Op.Repo)

    unless tags[:async] do
      Ecto.Adapters.SQL.Sandbox.mode(Op.Repo, {:shared, self()})

We have a code smell in a test we recently integrated that passes after almost exactly 20 seconds each time, so we suspect we may have side-effects from another test impacting this one. We’re investigating that part of it.

I believe we are using the above configuration is the same you get with a new phoenix app and that is we created this app from, and I had a quick glance in a new phoenix app to confirm as well.

Another question, why the query timeout set to around 50ms? Is that… low? If there is a 50ms network interruption due to jitter/queued packets etc. stuff out of the control of the machine running the elixir code… would this error happen?


For those out there that run into this, we increased the time from 50ms to 1000ms and this problem went away.

diff --git config/releases.exs config/releases.exs
index c2bb2657f..217cc973a 100644
--- config/releases.exs
+++ config/releases.exs
@@ -13,10 +13,11 @@ config :op, Op.Repo,
   username: System.fetch_env!("MYSQL_USER"),
   password: System.fetch_env!("MYSQL_PASSWORD"),
   port: String.to_integer(System.fetch_env!("MYSQL_PORT")),
   pool_size: String.to_integer(System.fetch_env!("MYSQL_POOL_SIZE")),
   timeout: String.to_integer(System.fetch_env!("MYSQL_QUERY_TIMEOUT")),
+  queue_target: 1000,
   charset: "utf8mb4",
   prepare: :unnamed,
   log: :debug

I actually even increased it to 5000 in one of my last commercial Elixir projects. Made the app really lag-proof and very rarely somebody actually had to wait for more than 1000-2000 ms.