I’ve been ignoring it up to this point because the test suite still passes and it hasn’t caused any development or deployment bottlenecks, but I’d like to figure out why it’s happening and what I might be able to do to stop it.
Version details, in case it matters:
Postgres v10.19 via the postgres:10-alpine Docker image
Is there a particular strategy I can use to keep the test process alive until the Repo call completes? I’m not doing anything particularly unusual in any tests, though I think some of my on_exit teardown code might involve some database cleanup. When testing code that relies on DB triggers, I have to run the tests unsandboxed or the triggers won’t fire. So those particular tests get cleaned up manually.
Did anyone figure out a fix for this? I’m having the same issue with my tests and since the errors appear to be fairly random, I have no idea how to even begin fixing them.
Still haven’t figured it out. Other priorities have prevented me from spending much time on it. I don’t think it’s a Task. It might be a GenServer that gets started by the test setup, but I do have code to stop it in the on_exit. So for now I’m still stumped.
Just to chime in with we seem to have the exact same intermittent errors.
We discovered they were MUCH worse when running the Elixir process locally but the database in docker on M1 Macs. Working theory being that it’s because the network connection on docker for Mac is quite slow.
The error also happens a lot more along with lots of intermittent failures when running in Github CI which are fairly resource constrained instances.
So wondering if it’s something to do with running lots of DB intensive stuff with some form of resource bottleneck. E.g. on Github actions there’s only 2 cores available for DB and parallel tests combined.
the error means that a db query is live when the test is torn down,
in our case it was due to some of our liveview tests clicking a submit button that caused a push_patch to happen in a live_component rather than the parent liveview,
for some reason LiveViewTest will wait for the patch if it happens in the parent process before returning, but not if it happens in a child component,
you can force it to wait by doing an assert_patch, or calling any of the functions that use the view object
not sure if that helps you, but thought i’d add it as it was a little confusing
Thanks for that info, @danturner. It turns out that all the tests exhibiting this behavior had one thing in common – they all started a GenServer (using start_supervised!) that was running a periodic query in the background. I added an on_exit handler to tell the GenServer to stop its polling behavior before the test continues with its teardown, and everything’s a-ok now. Thanks for the pointer!
Any tips on how to debug this? I’m working on a 5000 tests code base which shows this warning intermittently, and I would love to find which line triggers it, in order to fix…
If you find yourself in a similar situation, here is what worked so far:
Run mix test --trace to see which test file has the test that triggers this error
Put a setup block on this test file with on_exit(fn -> :timer.sleep(1000) end). This will give sometime between the test runs. Since on_exit runs in another process, the test process will be able to finish normally.
Run mix test test/app_name/problematic_test.exs --trace
This will show the DBConnection error right below the problematic test.
Also, since this is an intermittent error, it’s also useful to run the file multiple times with something like:
$ for i in {1..10} do; mix test test/app_name/problematic_test.exs --trace done
I found that some of my tests had GenServers running (started in the test setup using start_supervised!/2), and the GenServer was outliving the test’s sandbox repo connection. I had to make sure I had an on_exit handler that stopped the GenServer from continuing to try to use the database before the rest of the test teardown could happen.
I finally found that my DataCase modules were still using a previous approach which used the following syntax:
setup tags do
:ok = Ecto.Adapters.SQL.Sandbox.checkout(MyModule.Repo)
unless tags[:async] do
Ecto.Adapters.SQL.Sandbox.mode(MyModule.Repo, {:shared, self()})
end
end
and I replaced it with the new syntax:
setup tags do
MyModule.DataCase.setup_sandbox(tags)
:ok
end
def setup_sandbox(tags) do
pid = Ecto.Adapters.SQL.Sandbox.start_owner!(MyModule.Repo, shared: not tags[:async])
on_exit(fn -> Ecto.Adapters.SQL.Sandbox.stop_owner(pid) end)
end
The new approach uses the on_exit functionality of ex_unit, which allows “nested” on_exit callbacks to be completed before stopping the sandbox for a test. The old approach would monitor the test process itself and stop the sandbox as soon as the test process stopped, which is before on_exit callbacks executed. That’s why the new api is to be prefered – even generally I’d say.