Is GenServer the right solution to this problem?

RicoTrevisan · August 18, 2024, 10:11pm

Hi, I’ve got a list of Things – actually called Apps which can be a confusing name so let’s stick with Things – that I display to the user. The user can click on the ‘Load’ button next to each Thing. That kicks of a series of HTTPoison calls and then some database writes. Any user can click on multiple buttons and there’s a slight chance there could be multiple users using the system.

If I simply call the load/1 function and the user will be stuck in that page. Specially if the user makes various load/1 calls.

I was looking for a different alternative by reading some documentation, forums, and chatting it up with Claude. The acceptance criteria is something like:

a Thing can only have 1 load/1 workflow running on it at a time
if the workflow is running on a given Thing, users should not be allowed to start another one – aka disable the button.

I’m unsure of what approach I should use.

My standard approach would be to use the database, meaning

def load(thing) do
  Thing.update_thing(thing, %{processing: true})


  Task.async(... do the work ...)


  Thing.update_thing(think, %{processing: false})
end

But I’m encountering lots of sharp edges that I would have to deal with:

what happens if the user navigates away from the original page?
how can I update the status of the button when the Thing is no longer blocked?
is writing to the same db record twice a good idea?
…

So if Elixir or Phoenix have a more elegant solution. I’m wondering if this is not the right opportunity to implement a GenServer (or GenStage)?

al2o3cr · August 19, 2024, 1:46am

An Elixir-flavored solution to this problem could look something like:

the steps of load are done in a GenServer per “app”, started when a user pushes the “Load” button
those workers are managed with a DynamicSupervisor and named using a Registry
checking the status for an app’s “Load” button is checking to see if there’s a process registered
(optional) if you want a load to stop when a user navigates away, your “Listing Apps” page could be a LiveView and the “Load” worker could monitor it & exit if no longer needed
to update the button status in realtime, the worker should send a PubSub message, then the Liveview that’s rendering the button can subscribe and update

A followup question to think about: if the “Load” steps take more than a couple seconds, what should happen if the server wants to shut down while a load is running? You may need to adjust application + supervisor options to give things long enough to finish cleanly, or you’ll have to deal with partially-complete loads.

garrison · August 19, 2024, 3:32pm

To offer another option, I think your idea of using the database is a good approach, especially if you ever plan to distribute your app across multiple nodes.

Instead of using Task.async, you would start tasks under a supervisor using Task.Supervisor.start_child/2 so that they survive under the supervisor even if the LiveView process dies (and so that they don’t block the LiveView - perhaps counterintuitively, you still have to await tasks started with Task.async).

In order to guarantee that the work is only done once, you should perform your processing checks inside the task. You also need to be careful to use a lock on the initial check to ensure there are no race conditions because Postgres (and other relational dbs) do not use SERIALIZABLE isolation by default.

For example:

Task.Supervisor.start_child(MyApp.TaskSupervisor, fn ->
  query = where(Thing, id: ^thing_id, processing: false) |> lock("FOR UPDATE")
  {:ok, thing} = Repo.transaction(fn ->
    case Repo.one(query) do
      %Thing{} = thing ->
        Repo.update! Ecto.Changeset.change(thing, %{processing: true})
      nil ->
        raise "Thing already processing!"
    end
  end)
  work = do_work(thing)
  Repo.update! Ecto.Changeset.change(thing, %{work: work, processing: false})
end)

Note that this code doesn’t handle failures, which is not trivial depending on your use case. If it’s okay to do the work more than once (as long as it’s only updated once), you can do the locking at the final update (or use Ecto’s optimistic_lock/3). You could also use a timeout to say when it’s safe to retry (if you need at most once within a time frame), assuming you trust your clocks.

LiveView processes are just GenServers, so you can literally just send a message back to the LV and accept it in handle_info.

lv_pid = self()
Task.Supervisor.start_child(MyApp.TaskSupervisor, fn ->
  # do work...
  send lv_pid, {:work_complete, thing_id}
end)

If you need to update the page for multiple users, then you would use Phoenix PubSub.

It’s perfectly fine. Obviously you should write to the DB as little as possible, but in this case two is the best you can do.