I’ve got an annoying problem on my hands. I’m open to a different approach as well.
I’m writing a GenServer to store a live auction, when bids are submitted they’re stored in the database with Ash actions and the GenServer picks them up from PubSub.
I want to introduce a recovery mechanism so that if a GenServer crashes it can reload from the database and continue working. My current implementation does this in the GenServer.init/1 callback.
I’m writing tests for this and something that I can’t figure out is: when a GenServer is killed, the DynamicSupervisor will restart it and it will run the init/1 again. So far so good.
However, the new process has a different PID and Ecto.Adapters.SQL.Sandbox doesn’t allow connecting from it. I want to avoid :shared mode so tests can be async.
At this point there’s a race condition: when starting up the GenServer wants to hit the database and it can’t so it crashes.
I can’t allow the new PID because I don’t know what it is and by the time I look it up in the registry, it’s already crashed 
Has anyone managed to instrument tests for this scenario? Ideally without polluting the application code with these implementation details.
An approach I’ve used is passing the test process’s PID along when starting the child in DynamicSupervisor.
The GenServer then calls Ecto.Adapters.SQL.Sandbox.allow(YourRepo, parent_pid, self())
I don’t know if there’s an “official” way to ask the repo “are you sandbox”; there wasn’t when I wrote the code using this feature, so I did it by checking Application.fetch_env!(:your_app_name, YourRepo)[:pool] - if it’s the atom Ecto.Adapters.SQL.Sandbox, then the sandbox is enabled.
While your GenServer is doing that, another useful thing to add is Process.monitor(parent_pid). That allows avoiding two race-for-the-flag scenarios that spam the logs and crash unexpectedly:
- if the test process exits but the
GenServer is still running, subsequent Repo interactions will raise an error. Once you’re monitoring parent_pid, the GenServer can take appropriate action when the :DOWN message arrives. Usually that action is to return :stop, but you may also have cleanup to do.
- if the
GenServer crashes at just the wrong time, the DynamicSupervisor will restart it after the test process has already exited. Again, this will cause any Repo interaction in the GenServer to fail.
One other note: make sure you’ve tuned the restart parameters of DynamicSupervisor correctly, especially when running fast tests in parallel. Nothing more frustrating than trying to figure out why your tests fail randomly, but only when all run together, and only on a fast machine…
2 Likes
Make your genserver set :$callers key in process dictionary and Ecto sandbox should be able to figure ownership. More info on caller tracking: Task — Elixir v1.18.4
basically, you’ll have something like this:
defmodule MyGenserver do
use GenServer
def start_link(arg) do
callers = Process.get(:"$callers", [])
GenServer.start_link(__MODULE__, {arg, callers})
end
def init({arg, callers}) do
Process.put(:"$callers", [self() | callers])
...
end
end
5 Likes
This is my preferred approach as well, and it works for tests because :"$callers" is already set in the test process
1 Like
that’s a really elegant solution, thanks!