[error] MyXQL.Connection (#PID<0.265.0>) disconnected: ** (DBConnection.ConnectionError) owner #PID<0.567.0> exited

Laetitia · July 26, 2020, 7:42pm

Hi everybody,
i have an issue with my test suite. When i run each test separatly, everything is ok but in the whole test suite i have some DBConnection issues, althought all test pass.
Here is my case:
I have a Supervisor (a module that use Supervisor). This supervisor starts a scheduler. It is a genserver polling the DB periodically.
In the test, i start_supervised! this supervisor and do a few assert. But i get the following error:

MyXQL.Connection (#PID<0.265.0>) disconnected: ** (DBConnection.ConnectionError) owner #PID<0.567.0> exited

Client #PID<0.572.0> is still using a connection from owner at location:

:prim_inet.recv0/3
(myxql) lib/myxql/client.ex:171: MyXQL.Client.recv_packets/4
(myxql) lib/myxql/connection.ex:102: MyXQL.Connection.handle_execute/4
(ecto_sql) lib/ecto/adapters/sql/sandbox.ex:370: Ecto.Adapters.SQL.Sandbox.Connection.proxy/3
(db_connection) lib/db_connection/holder.ex:316: DBConnection.Holder.holder_apply/4
(db_connection) lib/db_connection.ex:1255: DBConnection.run_execute/5
(db_connection) lib/db_connection.ex:1342: DBConnection.run/6
(db_connection) lib/db_connection.ex:595: DBConnection.execute/4
(ecto_sql) lib/ecto/adapters/myxql/connection.ex:28: Ecto.Adapters.MyXQL.C
....

The Client #PID<0.572.0> is my scheduler and the owner is the test process.
As i read in the Ecto.Adapters.Sql.Sandbox, it seems to be normal: the test exits, the DBConnection lose its owner and notify the client process it doesn’t have owner anymore. But… the scheduler is querying the DB and is waiting for a reply. (https://hexdocs.pm/ecto_sql/Ecto.Adapters.SQL.Sandbox.html#module-faq)

I would like to know how to prevent such errors (too many red in the screen ;( )

Also, start_supervised is aimed to start the process under the test supervisor and the lifecycle of the test process is:

the test process is spawned
it runs setup/2 callbacks
it runs the test itself
it stops all supervised processes
the test process exits with reason :shutdown
on_exit/2 callbacks are executed in a separate process
(https://hexdocs.pm/ex_unit/ExUnit.Callbacks.html)

It seems that my scheduler is still alive after the process exit. Why it is not killed with my supervisor at step 4?

thank you for your responses

ityonemo · July 26, 2020, 9:29pm

Conceptually everything looks good. Can you show us the body of one test?

Laetitia · July 27, 2020, 5:08am

Here is:

test "a slow queue don't slow down the others" do
      scope = TestQueuetopia.scope()
      timeout = 600

      start_supervised!({TestQueuetopia, [poll_interval: 100]})

      %{id: id, queue: slow_queue} =
        Factory.insert( :slow_job, scope: scope, params: %{"duration" => 500}, timeout: timeout)

      %{queue: fast_queue} = Factory.insert(:success_job)
      _ = Factory.insert(:success_job, scope: scope, queue: fast_queue)

      :timer.sleep(250)
      assert is_nil(Jobs.get_next_pending_job(TestRepo, scope, fast_queue))
      assert %Job{id: ^id} = Jobs.get_next_pending_job(TestRepo, scope, slow_queue)

      _ = Factory.insert(:success_job, scope: scope, queue: fast_queue, timeout: timeout)
      :timer.sleep(500)
      assert is_nil(Jobs.get_next_pending_job(TestRepo, scope, fast_queue))
      assert is_nil(Jobs.get_next_pending_job(TestRepo, scope, slow_queue))
    end

Laetitia · July 27, 2020, 6:35am

If i add a :timer.sleep(500) at the end of the test, i don’t get the errors anymore. But i feel its a patch and i would like to understand more deeply what happens.
Also, if i kill the scheuler at the end of the test, i get another error:

i am down 2020-07-27 06:28:27.774081Z: #PID<0.603.0> (# it is a IO.inspect on the scheduler terminate callback)

09:28:27.774 [error] GenServer Queuetopia.TestQueuetopia.Scheduler terminating
** (DBConnection.OwnershipError) cannot find ownership process for #PID<0.603.0>.

When using ownership, you must manage connections in one
of the four ways:

* By explicitly checking out a connection
* By explicitly allowing a spawned process
* By running the pool in shared mode
* By using :caller option with allowed process

The first two options require every new process to explicitly
check a connection out or be allowed by calling checkout or
allow respectively.

The third option requires a {:shared, pid} mode to be set.
If using shared mode in tests, make sure your tests are not
async.

The fourth option requires [caller: pid] to be used when
checking out a connection from the pool. The caller process
should already be allowed on a connection.

If you are reading this error, it means you have not done one
of the steps above or that the owner process has crashed.

And its normal, because i kill it.

al2o3cr · July 27, 2020, 10:06am

It’s sent a message requesting a shutdown in step 4, but based on the error message the scheduler process is waiting for a reply from the DBConnection process. Even shutdown has to go through the message queue.

The core of a solution to this is ensuring the scheduler isn’t blocked before shutting down; sending any synchronous message will force the test process to wait until your scheduler process is handling messages. For instance, calling :sys.get_state(scheduler_name) will work

Laetitia · July 27, 2020, 12:09pm

It’s sent a message requesting a shutdown in step 4, but based on the error message the scheduler process is waiting for a reply from the DBConnection process. Even shutdown has to go through the message queue

Oh! Yes! That was the point !! Thank you very much!

j4y · December 2, 2024, 2:09am

Did you ever find a solution to this? I have the same problem.