GenServer Controller and Worker idiomatic pattern desired

I have a controller and a worker module. The controller – routes requests to the appropriate worker pid and handles worker crashes so the client handler doesn’t have to. The controller presents a single consistent interface for the client, and allows the client to continue processing in the event of a worker crashing as a new worker is fired up and used.

When the worker crashes (for whatever reason), the controller module catches the exit - e.g. thanks to the temporary monitor GenServer sets up under the covers while waiting for a response - (thanks Sasa) - and the Controller continues on.

However, there are race conditions obviously present when the Worker.Supervisor doesn’t create a new child by the next time get_worker_pid is called, making the Controller process crash e.g. issuing a no pid found. Under the covers gproc handles the process reg for the dynamically started worker processes

Is there an idiomatic way to handle this cleanly?

[I’ve seen code that wraps the pid request into two methods, one residing in a pid request in the client API call and another again in a serialized server call. I’ve also seen code that uses timeouts and returns an initial noreply to later return a `GenServer.reply`. I’m also wondering if any of this pid retrieval should be moved to the `Worker.Supervisor` somehow…Anyways, would appreciate the standard go-to pattern here…]

Here’s the snippet of the controller module use case:

  def handle_call({:start_worker, id}, _from, state) do
    {:ok, _wpid} = 
      Worker.Supervisor.start_child(id)

    {:reply, :ok, state}
  end
  
  def handle_call({:action, id, data}, _from, state) do

    response = 
    try do
      get_worker_pid(id) |> Worker.action(data)
    catch :exit, reason ->
      Logger.info "Caught exit in controller, reason is #{inspect reason}"
      {:retry, reason}
    end
  
    {:reply, response, state}
  end


  defp get_worker_pid(id) do    
      case Worker.whereis(id) do
        :undefined -> raise "Couldn't find worker pid"
        pid -> pid
      end
  end

Thanks B

2 Likes

Can you talk about the use case a bit more? It doesn’t sound like the process architecture you have here is fitting well with the scenario, but without more information about that scenario it’s hard to advise further.

2 Likes

The controller is being called in a client loop and has a static id. The controller could be thought of as a proxy. The restarted worker is able to continue where the crashed worker left off.

To answer your question with another question, what is if any the controller pattern in otp should one exist. This I believe is different than a router.

Thanks again

I’m going to use a via tuple in my genserver worker call instead of fiddling with the whereis call. This should address the race condition I believe. Cheers