Ecto Sandbox ownership issue when persisting state in terminate callback

totorigolo · February 18, 2021, 11:26pm

Hello,

Context
I am developing a toy website to get myself familiar with Elixir and Phoenix. The website is like multiple shared canvases where everyone with a token can change pixels (a bit like multiple tiny https://pixelcanvas.io/). On the home page, you can see auto-refreshing thumbnails of each canvas, and when you click on one, you see the full-size version with live updates using a Phoenix Channel. The canvases can be created or deleted on the fly in the admin section, which will create the new canvas (UUID, name) in the DB and spin-up a (unique) GenServer (per canvas). Without persistence, I successfully deployed it in a cluster, using global processes (via Horde), so far so good.

Now, I want to persist the images. To do that, whenever a canvas GenServer terminates, it saves the PNG blob in the DB. When it starts, it tries to load it from the DB. This works well, I can add/remove nodes from the cluster and the image survives.

Here is how canvas GenServers are handled:

when I create a canvas using Canvases.create_canvas/1, the function creates the row in DB, starts the GenServer under a (Horde) DynamicSupervisor using CanvasManager.start_canvas_server/1, and publishes a :canvas_created PubSub event to update the LiveViews.
when I delete a canvas, using Canvas.delete_canvas/1, it deletes the canvas and persisted images from the DB and publishes a :canvas_deleted PubSub event for both the LiveView and the CanvasManager, which will in turn terminate the GenServer.
when the application starts, CanvasManager is started under the supervision tree and will start all canvases.

The GenServer is needed to interact with the canvas: get the version, get the image as PNG, get a pixel color, set a pixel color. The DB is only used to list the existing canvases and persist the image.

Problem
I have a hard time figuring out how to test the GenServer now that there is persistence. I understand the issue, but I don’t know what’s the best way to address it: when I create a new canvas, I call Canvases.create_canvas/1 via a fixture; this will also start the GenServer under my application’s supervisor (not ExUnit’s). To stop the GenServer a the end of the test, the fixture calls ExUnit.Callbacks.on_exit/1. The GenServer cannot work without the row in DB (will fail on termination), and I want to keep the supervision tree clean after each test. The issue is that on_exit is executed in a different process than the test one, which is not the owner of the sandbox DB connection.

I read in some issue on GitHub that Jose does not recommend updating the DB from inside on_exit, so I think that there is something wrong with my design. What would you suggest then?

Manually waiting in every test that creates a canvas for the GenServer to terminate?
Not starting GenServer in tests that don’t need it? How? And that does not solve the issue for the tests that do need it.
Start the processes using ExUnit.Callbacks.start_supervised/2? How can I do that? Would that solve the ownership issue?
Something else? I don’t have much ideas.

Please tell me if you need any more details. The full source code is on GitHub (“canvas” => “lobby”/“image”, the GenServer is called ImageServer, I renamed here for clarity).

Thanks for the help!
Thomas

* For the curious: I am using a straightforward Rust NIF via Rustler to be able to mutate images.