I’ve been ignoring it up to this point because the test suite still passes and it hasn’t caused any development or deployment bottlenecks, but I’d like to figure out why it’s happening and what I might be able to do to stop it.
Version details, in case it matters:
Postgres v10.19 via the postgres:10-alpine Docker image
Is there a particular strategy I can use to keep the test process alive until the Repo call completes? I’m not doing anything particularly unusual in any tests, though I think some of my on_exit teardown code might involve some database cleanup. When testing code that relies on DB triggers, I have to run the tests unsandboxed or the triggers won’t fire. So those particular tests get cleaned up manually.
Still haven’t figured it out. Other priorities have prevented me from spending much time on it. I don’t think it’s a Task. It might be a GenServer that gets started by the test setup, but I do have code to stop it in the on_exit. So for now I’m still stumped.
Just to chime in with we seem to have the exact same intermittent errors.
We discovered they were MUCH worse when running the Elixir process locally but the database in docker on M1 Macs. Working theory being that it’s because the network connection on docker for Mac is quite slow.
The error also happens a lot more along with lots of intermittent failures when running in Github CI which are fairly resource constrained instances.
So wondering if it’s something to do with running lots of DB intensive stuff with some form of resource bottleneck. E.g. on Github actions there’s only 2 cores available for DB and parallel tests combined.
Thanks for that info, @danturner. It turns out that all the tests exhibiting this behavior had one thing in common – they all started a GenServer (using start_supervised!) that was running a periodic query in the background. I added an on_exit handler to tell the GenServer to stop its polling behavior before the test continues with its teardown, and everything’s a-ok now. Thanks for the pointer!