Intermittent test errors with Ecto sandbox

stevedomin · February 4, 2018, 2:34pm

Hello,

I have a bunch of concurrent API integration tests that rely on the Ecto Sandbox.

In each test I’m using httpoison to make a request to a specific API endpoint. In a setup block I’m setting the Ecto sandbox metadata in a header and using the built-in Phoenix plug to parse this header and allow my test process to use the sandbox (this is pretty much what Wallaby and Hound are doing).

Everything was working fine until recently. I’ve started seeing intermittent test failures with different failure modes:

sometimes I’m seeing the classic:

(DBConnection.OwnershipError) cannot find ownership process for #PID<0.2796.0>.

sometimes I’m getting this:

** (exit) exited in: GenServer.call(#PID<0.1037.0>, {:checkout, #Reference<0.45022272.2543845379.42504>, true, 15000}, 5000) 
   ** (EXIT) shutdown: "owner #PID<0.1036.0> exited with: shutdown"

sometimes one of the resource I create in the setup block (an access token) is not found. Not sure how that’s possible.

Note that:

the test suite is running perfectly fine (= not a single error) at times.
this issue disappear when setting --max-cases 1.

Did anyone run into a similar issue? How would you go about debugging the issue?

Thanks

mbaeuerle · April 16, 2018, 11:41am

It seems there are several people having this issue (though max-cases doesn’t fix it for us).
I created a repo to reproduce this, you can get more detailed information in my reply to the same issue here: Tests randomly failing when run in a VM
Unfortunately this does not solve the issue but maybe we can investigate what the cause is.

josevalim · April 16, 2018, 1:22pm

That happens because you have a long running process that uses the connection during the test, then the test terminates, and it tries to use the connection again but it cannot because the owner process (the test) is done.

You need to do like @mbaeuerle and explicitly terminate any process that may got ahold of the connection. But remember to do it inside the test itself and not inside an on_exit callback.