How can I monitor when a process is restarted

2Sjch8AT · January 16, 2018, 4:17pm

How can I monitor if a GenServer process is started or restarted by a supervisor?

I was thinking about passing the observer pid to the GenServer and sending a message from the init function. However, this would mean the message would be sent shortly before the GenServer is started. Using send_after would mean the message could be sent too late.

peerreynders · January 16, 2018, 5:45pm

Typically this problem is solved by registering the process with Process.register/2 (or using the a name option when it is created). If the pid for a named process is required, it can be obtained with Process.whereis/1.

Maybe I should ask: why does your process need to be informed that a replacement has taken place?

2Sjch8AT · January 16, 2018, 5:57pm

My problem is not only to keep a constant “reference” to a process, which can be done with a name as you suggest, but also to monitor when it is restarted.

peerreynders · January 16, 2018, 5:59pm

At least you can get a notification when the old one dies with Process.monitor/1.

One trick is have an ETS table that is owned by the supervisor, but is managed by the supervised process. So store the observing pids in the ETS table and have the fresh process send notifications to the observing processes about “the change in management”.

See Steve Vinoski: Don’t Lose Your ETS Tables

The observing processes would still have to use monitors if they need to know quickly that the old process has terminated.

outlog · January 16, 2018, 6:58pm

think you should also be able to catch the terminate in your process, especially if you use Process.flag(:trap_exit, true), but without seeing code I’m not 100.

peerreynders · January 16, 2018, 7:13pm

An exit signal happens for linked processes. But it’s important to remember:

The primary intent of an exit signal is to terminate all processes that are linked together - because “the whole” cannot succeed when a “part” is missing.
Trapping the exit signal transforms it to an EXIT message. Often this is done so that a process can terminate in a graceful manner - e.g. to release precious resources before terminating itself.
A link is a process relationship that works both ways - either process terminating will result in an exit signal to the other.
A monitor is a one way process relationship - only the monitoring process will be informed with a DOWN message of the demise of the other - not the other way around.
Even though a supervisor doesn’t want to be terminated by its children (i.e. the supervisor traps exits), it does want to take down ALL its children if the supervisor unexpectedly goes down. So it makes perfect sense to use links rather than monitors in supervisors.

2Sjch8AT · January 17, 2018, 8:40am

I would like for the observer process to count the number of times a process is restarted or killed. This way the observer process could avoid invoking the faulty process for a few seconds (if it fails too often) and use other processes (in my problem they are all similar but use different “channels” to send data).

Monitoring with Process.monitor is not enough, as once the process is killed, after the restart, Process.monitor should be called on the new process, but we can’t know about it since we don’t have an event when it restarts.

dom · January 17, 2018, 10:15am

This sounds a bit strange for an architecture. Why does the process crash loop? If it uses an external system that can be unavailable, then it should explicitly handle that situation and return e.g. {:error, :unavailable}, rather than crash. Then the client is free to handle that and choose to do something else. See “It’s about the guarantees”.

init runs inside the GenServer process, so it definitely won’t do anything “before the GenServer is started”.

yurko · January 17, 2018, 11:38am

This way the observer process could avoid invoking the faulty process for a few seconds (if it fails too often) and use other processes (in my problem they are all similar but use different “channels” to send data).

You might want to take a look at circuit breaker implementations:

2Sjch8AT · January 17, 2018, 12:19pm

Thanks for your helpful comment. I don’t think I have that much of control to detect the external system becoming unavailable, as I am using third-party libs to communicate with them.

Thanks for the article.

OvermindDL1 · January 17, 2018, 5:36pm

Wouldn’t this just be a normal Supervisor? Why not make your own sub implementation?