Detecting the GenServer/process died the correct way (from caller)

hubertlepicki · May 13, 2016, 6:11am

So I have a bunch of GenServers, registered in a pg2 registry. I am implementing a flavor of PubSub, but with priorities and also some calls may be performed synchronously, if not all.

Basically I want to find a bunch of GenServers that registered in my pg2 by some name, and then call them one by one, after previous call finished, so that only one thing is happening at a time.

My GenServers are already supervised and gets re-intialized when one of them dies.

The problem is: how do I correctly respond to one of the GenServers going tits up while it’s processing a call? I can do the following:

> iex(3)> try do                                             
> ...(3)>   GenServer.call(pid, {:event, %{elo: "ziom"}})
> ...(3)> catch
> ...(3)>   :exit, _ -> IO.puts("GenServer died")
> ...(3)> end
> GenServer died

So, I can use try/catch :exit clause. The docs say that doing so is extremelly rare.

Is this the correct way of handling GenServer crashes while they process my calls, or is there a better way?

Qqwy · May 13, 2016, 7:07am

Yes and no. There exists another way to trigger functionality when processes die: Links and Monitors.

A Link is bidirectional, and if one of the processes dies, the one connected to it with a link dies as well. If exit trapping is enabled, however, instead of the linked process dying when getting an exit message from a linked process, instead, an {:exit, from, reason} message is sent to that linked process instead.

This is what Supervisors use internally.

A Monitor is one-way. Whenever the process that is being monitored exits, a {:DOWN, ref, :process, pid, reason} is sent to all processes that monitor it. This is probably what you want to use here.

However, Links and Monitors dispatch a new message to the current process. It therefore is not possible to check if the GenServer died because of a call you made inside the very function you’re now executing.
As a GenServer usually responds with {:ok, result}, and raises {:exit, reason} when exiting improperly, I believe that it is okay to catch it like above.

(Disclaimer: I am but a mere OTP-greenhorn myself. Someone more knowledgeable might correct me)

rvirding · May 13, 2016, 8:02pm

The GenServer already uses a monitor internally. When you do a GenServer.call the server will be monitored until the reply is received or the call times out[*]. If the server was dead or dies when processing your request then the monitor will detect this and the call function will generate an error and crash. You then have to decide how to handle that crash, either just let the process crash or try and catch it in a try ... catch and try and clean up. Usually the crashing is what you want. When the reply received, or you timeout, then the monitor is removed so it will not affect you afterwards.

Robert

[*] There is a GenServer.call/3 function where the 3rd argument is a timeout value saying how long you want to wait for a reply before giving up, and crashing. The default timeout value in GenServer.call/2 is 5000 msecs so beware of this and don’t let your server take too long time.

michalmuskala · May 20, 2016, 6:41pm

When catching errors from GenServer.call you need to be careful about replies. The GenServer may still send you a message with a reply later on, so you need to take care of properly ignoring it.

rvirding · June 21, 2016, 9:48pm

Actually this is no problem as the reply contains a unique reference which will only be received by the call which sent the request. This means that future calls will never receive that message. Of course it will still be in the message queue.