Tests randomly failing when run in a VM

mbaeuerle · April 13, 2018, 1:25pm

Here is the test case: https://github.com/semlabs/ecto_sandbox
and here you can see the failing test after a couple of tries: https://semaphoreci.com/semlabs/ecto_sandbox/branches/master/builds/25

It’s a very simple app:

Create supervised workers which run as GenServers (Module Worker)
Make a request with a specific worker name to a POST route sending some data (Module CollectorController)
The worker processes and stores the data in the database (Module Server)
Make sure the current test waits until the data was stored (Module Notification)
Enable ownership logging for db_connection
The test uses ConnCase which sets the ownership mode to shared

As you can see the test randomly fails with:

** (DBConnection.OwnershipError) cannot find ownership process for #PID<0.438.0>.

When using ownership, you must manage connections in one
of the four ways:

* By explicitly checking out a connection
* By explicitly allowing a spawned process
* By running the pool in shared mode
* By using :caller option with allowed process

The first two options require every new process to explicitly
check a connection out or be allowed by calling checkout or
allow respectively.

The third option requires a {:shared, pid} mode to be set.
If using shared mode in tests, make sure your tests are not
async.

The fourth option requires [caller: pid] to be used when
checking out a connection from the pool. The caller process
should already be allowed on a connection.

If you are reading this error, it means you have not done one
of the steps above or that the owner process has crashed.

See Ecto.Adapters.SQL.Sandbox docs for more information.
    (db_connection) lib/db_connection.ex:934: DBConnection.checkout/2
    (db_connection) lib/db_connection.ex:750: DBConnection.run/3
    (db_connection) lib/db_connection.ex:592: DBConnection.prepare_execute/4
    (ecto) lib/ecto/adapters/postgres/connection.ex:86: Ecto.Adapters.Postgres.Connection.execute/4
    (ecto) lib/ecto/adapters/sql.ex:256: Ecto.Adapters.SQL.sql_call/6
    (ecto) lib/ecto/adapters/sql.ex:542: Ecto.Adapters.SQL.struct/8
    (ecto) lib/ecto/repo/schema.ex:547: Ecto.Repo.Schema.apply/4
    (ecto) lib/ecto/repo/schema.ex:213: anonymous fn/14 in Ecto.Repo.Schema.do_insert/4
    (ecto_sandbox) lib/ecto_sandbox/workers/server.ex:20: EctoSandbox.Workers.Worker.Server.create/1
    (ecto_sandbox) lib/ecto_sandbox/workers/server.ex:11: EctoSandbox.Workers.Worker.Server.store/2
    (ecto_sandbox) lib/ecto_sandbox/workers/server.ex:31: EctoSandbox.Workers.Worker.Server.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.436.0>): {:process, %{"data" => "test"}}

Notice that the test runs with #PID<0.436.0> as seen in the last line.
And a little bit before we can see the ownership changes. So the test process loses its proxy before the worker gets it:

.13:01:06.667 [error] #PID<0.436.0> owns proxy #PID<0.437.0>
13:01:06.671 [error] #PID<0.428.0>, #PID<0.430.0> lose proxy #PID<0.429.0>
13:01:06.731 [error] #PID<0.436.0> lose proxy #PID<0.437.0>

And that’s even before the test ends.