Strange DBConnection.ConnectionError with Ecto 3.0.6

I’m not sure how to interpret the following error. What is the relationship of PID 0.553.0 to 0.2195.0? It seems that 0.553.0 may be the test process in which case the error doesn’t make sense to me because (as I’m interpreting it) the client and the owner are the same process.

03:50:50.547 [error] Postgrex.Protocol (#PID<0.553.0>) disconnected: ** (DBConnection.ConnectionError) owner #PID<0.2195.0> checked in the connection

Client #PID<0.2195.0> is still using a connection from owner at location:

    :prim_inet.recv0/3
    (postgrex) lib/postgrex/protocol.ex:2701: Postgrex.Protocol.msg_recv/4
    (postgrex) lib/postgrex/protocol.ex:2454: Postgrex.Protocol.recv_close/3
    (postgrex) lib/postgrex/protocol.ex:1169: Postgrex.Protocol.close_parse_describe_flush/3
    (postgrex) lib/postgrex/protocol.ex:1587: Postgrex.Protocol.handle_prepare_execute/4
    (ecto_sql) lib/ecto/adapters/sql/sandbox.ex:375: Ecto.Adapters.SQL.Sandbox.Connection.proxy/3
    (db_connection) lib/db_connection/holder.ex:268: DBConnection.Holder.holder_apply/4
    (db_connection) lib/db_connection.ex:1189: DBConnection.run_execute/5
    (db_connection) lib/db_connection.ex:1276: DBConnection.run/6
    (db_connection) lib/db_connection.ex:550: DBConnection.execute/4
    (ecto_sql) lib/ecto/adapters/postgres/connection.ex:80: Ecto.Adapters.Postgres.Connection.execute/4
    (ecto_sql) lib/ecto/adapters/sql.ex:571: Ecto.Adapters.SQL.execute!/4
    (ecto_sql) lib/ecto/adapters/sql.ex:553: Ecto.Adapters.SQL.execute/5
    (ecto) lib/ecto/repo/queryable.ex:147: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:18: Ecto.Repo.Queryable.all/3
    test/hobnob_messages/queries/mentions_query_test.exs:101: HobnobMessages.Queries.MentionsQueryTest."test unread_in_threads_for_user/1 includes messages in threads when the user has no thread member"/1
    (ex_unit) lib/ex_unit/runner.ex:312: ExUnit.Runner.exec_test/1
    (stdlib) timer.erl:166: :timer.tc/1
    (ex_unit) lib/ex_unit/runner.ex:251: anonymous fn/4 in ExUnit.Runner.spawn_test/3

The connection itself was checked out by #PID<0.2195.0> at location:

    (ecto_sql) lib/ecto/adapters/postgres/connection.ex:80: Ecto.Adapters.Postgres.Connection.execute/4
    (ecto_sql) lib/ecto/adapters/sql.ex:571: Ecto.Adapters.SQL.execute!/4
    (ecto_sql) lib/ecto/adapters/sql.ex:553: Ecto.Adapters.SQL.execute/5
    (ecto) lib/ecto/repo/queryable.ex:147: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:18: Ecto.Repo.Queryable.all/3
    test/hobnob_messages/queries/mentions_query_test.exs:101: HobnobMessages.Queries.MentionsQueryTest."test unread_in_threads_for_user/1 includes messages in threads when the user has no thread member"/1
    (ex_unit) lib/ex_unit/runner.ex:312: ExUnit.Runner.exec_test/1
    (stdlib) timer.erl:166: :timer.tc/1
    (ex_unit) lib/ex_unit/runner.ex:251: anonymous fn/4 in ExUnit.Runner.spawn_test/3

The failure is intermittent but happens quite often on a full test run. I don’t get the same error on Ecto 2.2.11.

Environment:

  • ecto: 3.0.6
  • ecto_sql: 3.0.4
  • postgrex: 0.14.1
  • Elixir: 1.7.4
  • Erlang: 21.0.4
  • Mac OS X: 10.13.6

Blind shot in the dark but lately we had an intermittent error during tests (not the same as yours though) and we found that upgrading Erlang/OTP eliminated it. Can you try?

1 Like

No luck, still getting the same error on Elixir 1.8.1 and Erlang 21.2.4

1 Like

Does anyone have any ideas on the error?

The fact that these are all the same PIDs is what boggles me:

(DBConnection.ConnectionError) owner #PID<0.2195.0> checked in the connection

Client #PID<0.2195.0> is still using a connection from owner ...

The connection itself was checked out by #PID<0.2195.0>

Yup, it looks like a DBConnection bug. What is your DBConnection version?

1 Like

The only scenario I can see it happening is if you have your tests as async but you are changing the mode of the sandbox. Or you have another process changing the sandbox mode while this test runs. Can you please get all of the places you call Ecto.Adapters.SQL.Sandbox and put the code snippets here?

1 Like

Yes!! That was exactly it. Thanks for the tip. I had a 1.5 year-old test that looked like this:

    test "it sends a notification to the user channel if they are a member that is not present",
         %{user: user, room: room} do
      # This test uses another process
      Ecto.Adapters.SQL.Sandbox.mode(MyApp.Repo, {:shared, self()})
      
      # rest of the test
    end

I think my thinking was that I wanted just this test to use the shared mode of the sandbox and since every test runs in a separate process that Sandbox.mode would set the mode for that specific process (since you’re passing the pid in).

From the docs it seemed okay:

iex(3)> h Ecto.Adapters.SQL.Sandbox.mode

                              def mode(repo, mode)

Sets the mode for the repo pool.

The mode can be :auto, :manual or {:shared, <pid>}.

Although I guess I should have realized that the “repo pool” is a shared resource across multiple tests.

But for my understanding, this is what I think was happening:

  1. Multiple tests are running concurrently (since we’re using async: true)
  2. This test called Ecto.Adapters.SQL.Sandbox.mode(MyApp.Repo, {:shared, self()})
  3. When the DBConnection.Ownership.Manager GenServer received the mode change request it checked in all of the existing transaction checkouts, including for tests that were still running!
  4. The tests that were still running tried to continue to use their connection, but the Manager GenServer had already checked them in resulting in the warning at the top of this thread.
  5. The tests that were still running would then check out a new connection, and since this new connection was running in a new db transaction, any database entries inserted in the previous connection no longer exist which caused related test failures such as foreign key exceptions

I plan to create a doc PR to add some sort of warning to the Ecto.Adapters.SQL.Sandbox.mode function, unless this can be considered a true db_connection bug (I’m using db_connection 2.0.5). It would be nice if Ecto.Adapters.SQL.Sandbox.mode would eventually cause a warning if it forces the check-in of unrelated existing check-outs. Also this part of the original error message is not technically correct in this scenario:

(DBConnection.ConnectionError) owner #PID<0.2195.0> checked in the connection

Since #PID<0.2195.0> did not initiate the check-in (at least this is that is my current understanding). It looks like you’re in the process of adding an improvement for this case. That PR changes the error message to #PID<0.3100.0> checked in the connection owned by #PID<0.3096.0> which is a definite improvement since #PID<0.3100.0> was the process that was running the test that called Sandbox.mode.

Also due to the new $callers tracking in db_connection that test runs just fine in async mode without having to change the sandbox pool mode.

Thanks again for the help in tracking this down!

4 Likes

It might be helpful if we have a function along the lines of Ecto.Adapters.SQL.Sandbox.moded_run function (akin to unboxed_run) that accepts a function and uses the specified mode only in it – if that’s at all doable of course. Since the mode is a shared resource I’m guessing my suggestion is unrealistic.

1 Like

@josevalim should I file this as a bug on ecto_sql? Or is it an issue on db_connection?

Hi @axelson, there is no bug here. So the best would be a PR to improve the docs. :+1: We have already improved the error message on DBConnection master.

1 Like

Okay, I’ll prepare a docs PR :+1:

I am using Task.async_stream function in one of my functions, which updates a list of datas in the DB. While running tests over this main function, it throws
Client #PID<0.773.0> is still using a connection from owner at location: this error. How can I fix this?

In my case, I had the following line in both test_helper.exs and setup block.

Ecto.Adapters.SQL.Sandbox.mode(Blog.Repo, {:shared, self()})

Removing the line from test_helper.exs solved the problem.

3 Likes

If you are writing test for a method where you have written a Task.async call, you can use :timer.sleep(<a minimum amount of time in milisecond>) would help. As it is not getting a result which is being done by another process, so wait a bit to let it complete.

Or the async Task can just send a message to the parent process. That makes it both more stable and predictable in the original code and more testable as well. :slight_smile: