Intermittent DBConnection.ConnectionError in tests

When running my tests, I sometimes see an error that looks like

[error] Postgrex.Protocol (#PID<0.788.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.7569.0> exited

Notable details:

  • It doesn’t always happen
  • When it does happen, it’s not always in the same test or test file
  • It doesn’t actually cause any tests to fail; it just shows up as output in the middle of the output like so:

I’ve been ignoring it up to this point because the test suite still passes and it hasn’t caused any development or deployment bottlenecks, but I’d like to figure out why it’s happening and what I might be able to do to stop it.

Version details, in case it matters:

  • Postgres v10.19 via the postgres:10-alpine Docker image
  • postgrex v0.16.1
  • ecto v3.7.1
  • ecto_sql v3.7.2
  • ecto_psql_extras v0.7.4

Any thoughts?

The usual cause of that message is code that’s still waiting for the result of a Repo call when the test process shuts down.

2 Likes

Is there a particular strategy I can use to keep the test process alive until the Repo call completes? I’m not doing anything particularly unusual in any tests, though I think some of my on_exit teardown code might involve some database cleanup. When testing code that relies on DB triggers, I have to run the tests unsandboxed or the triggers won’t fire. So those particular tests get cleaned up manually.

1 Like

I think we’re having the same issue and we haven’t nailed the root cause either. Things I’d look into:

Could it be a Task that gets spawned somewhere?

Did anyone figure out a fix for this? I’m having the same issue with my tests and since the errors appear to be fairly random, I have no idea how to even begin fixing them.

Still haven’t figured it out. Other priorities have prevented me from spending much time on it. I don’t think it’s a Task. It might be a GenServer that gets started by the test setup, but I do have code to stop it in the on_exit. So for now I’m still stumped.

Just to chime in with we seem to have the exact same intermittent errors.

We discovered they were MUCH worse when running the Elixir process locally but the database in docker on M1 Macs. Working theory being that it’s because the network connection on docker for Mac is quite slow.

The error also happens a lot more along with lots of intermittent failures when running in Github CI which are fairly resource constrained instances.

So wondering if it’s something to do with running lots of DB intensive stuff with some form of resource bottleneck. E.g. on Github actions there’s only 2 cores available for DB and parallel tests combined.

1 Like

Heya,

not sure if you’re still having the issue…

the error means that a db query is live when the test is torn down,

in our case it was due to some of our liveview tests clicking a submit button that caused a push_patch to happen in a live_component rather than the parent liveview,

for some reason LiveViewTest will wait for the patch if it happens in the parent process before returning, but not if it happens in a child component,

you can force it to wait by doing an assert_patch, or calling any of the functions that use the view object

not sure if that helps you, but thought i’d add it as it was a little confusing

i raised an issue here LiveviewTest render_submit behaves differently if form is in parent liveview or child live_component · Issue #2579 · phoenixframework/phoenix_live_view · GitHub as the behaviour feels inconsistent

3 Likes

Thanks for that info, @danturner. It turns out that all the tests exhibiting this behavior had one thing in common – they all started a GenServer (using start_supervised!) that was running a periodic query in the background. I added an on_exit handler to tell the GenServer to stop its polling behavior before the test continues with its teardown, and everything’s a-ok now. Thanks for the pointer!

2 Likes