Designing Supervision Trees - a case for one-way links?

mindreader · September 8, 2017, 8:42pm

I’m going to tack onto this thread a question I have.

I’m having a lot of situations where I want basically a one way link. I want this process to die if any of these other processes that are not supervisors happens to die (which should be rare, but it could happen). But I don’t want the death of this unimportant process to kill any of those other much more important processes, as that would cause a cascade of death in my system.

I could do it with monitor, except that everything that monitors something else also has to implement a handler for down messages and if it already monitors things, it has to book keep to ensure that the pid that went down is one that requires a crash.

In some cases I can trap_exit and track pids, but the logic is difficult and fragile.

I have been thinking about how hard it would be to implement a one way link and it doesn’t seem difficult to me. Is there something like that that already exists?

peerreynders · September 8, 2017, 9:37pm

This suggests to me that you may be using sub-optimal linkages. It’s called a “supervision tree”, not “supervision mesh” - the idea being that non-normal process death propagates up the tree until it reaches a process trapping the exit (usually a supervisor but not always) that knows how to deal with it.

So the best way to approach this discussion, is to describe concrete scenarios where you would feel the concept of a half-link would be invaluable (and make a case for why you feel that way) giving the community a chance to either agree with you or respond how they would approach the challenge within the means that currently exist.

mindreader · September 9, 2017, 1:34pm

My use case is that there are thousands genservers that hold reasonable amounts of data that is often expensive or impractical to initially retrieve. Other processes will subscribe to portions of the data that these processes hold, get an initial subset of the data they need from each and then receive updates whenever an event occurs that might change that data. One process might subscribe to up to 20 or so different pieces of state and will continually subscribe or unsubscribe to state throughout its life.

The problem is that if the state holding process dies, this process will no longer receive updates from that process. Obviously I want it to die so that it will restart and resubscribe to all the data it needs. But if it were to also kill the thing it subscribed from, it would also kill all the other things subscribed to that process, needlessly, then all the other data holders would die in a six degrees from kevin bacon scenario.

My problem with monitors is that their logic is spread out in any process that uses it and it just seems error prone. If I do it wrong, I could end up with processes that didn’t die but mysteriously aren’t getting events from some places. I do not want to debug that. Much better to have the kill logic in its own process. Bonus, I could also query it to see how “important” a particular process is by building out a graph of processes that depend on its state.

I think I’ve already decided I’m going to code this one way link library if it doesn’t already exist somewhere unless someone can give me a really good reason not to.

peerreynders · September 9, 2017, 2:25pm

One possible way to deal with that particular scenario is to place the subscription information in an ets table that is owned by the supervisor of the GenServer that is using it; that way when the supervisor restarts the GenServer the subscription information is ready to go and picked up by the new process - not saying it’s necessarily the best or particularly elegant but it’s an option.

I would also be inclined to monitor the subscribers so that they are removed from the subscription list as soon as they terminate. It may even make sense to have a separate process managing the subscriptions and act as a “portal” to the process holding the “high-value state” - essentially making each of them simpler to give them fewer reasons to crash.