Ecto streams, Flow and "could not checkout the connection owned by PID" errors

cblavier · October 28, 2019, 12:19pm

Hi there,

I’m using Flow to compute some data streamed from Postgres (using Bourne library)

It’s working properly but I’m really having hard time to make my tests pass, because of random could not checkout the connection owned by PID errors.

Here is what is my code under test is doing:

create 4 Postgres streams (using Bourne)
retrieve a few settings from DB using regular Ecto queries
run my Flow process using Flow.from_stages from postgres streams and retrieved settings

It’s failing 75% of the time

If I invert first steps and run 2, 1 and 3, it seems to be working properly.

So, it looks like that creating the streams is kinda locking my Ecto.Sandbox connection. Any idea how to fix this ? (I could indeed invert first steps, but it is a bit awkward and it also feel hack-ish to me)

ityonemo · October 28, 2019, 1:14pm

Without more insight into the code, if it’s working 75% of my time my instinct is that it’s a race condition. Have you tried dropping in Process.sleep/1 between the various steps to see if that changes the rate of erroring out?

Also if I’m not mistaken ecto knows what sandbox to associate with what pid by looking at the $callers process value. If it’s failing in the Flow part, which is going to be in another process, you should drop in a Process.get(:"$callers") |> IO.inspect" (if you can) and make sure that you can see the test’s PID in there.

sb8244 · October 28, 2019, 1:47pm

Are your tests running async? Do they work all of the time when not running in async mode?

This sounds an awful lot like the testing connection checkout tool. It’s possible to get working async in this case, but could be somewhat difficult.

If it fails even when running async: false, my next check would be to see if the flow process still exists after the test process exits.

cblavier · October 28, 2019, 4:08pm

My tests are already running in async mode.

But I somewhat managed to workaround the issue by changing the Bourne streaming behavior from Postgres cursor to simple SQL limit + offset. I guess using a cursor was eagerly creating a new idle connection, which Ecto.Sandbox did not like