Bigger picture, and I wonder if you might offer your thoughts on this. I find myself struggling with several architecture issues with Erlang/Elixir, which I don’t find well covered, and largely the underlying issue is about understanding the detail of exit signals, usually in the context of either needing to cleanup some resource or cascading that failure to something else in a controlled way
1 ) Given a manager process, that starts worker processes. If workers dying shouldn’t take down the manger process then it would seem like the right structure is to start the workers under their own supervisor. However:
Q: If you wanted the termination of the manager to stop the children, then how to construct this? I guess I struggle with knowing how many ways the manager could die and whether any of them need special handling in order to guarantee cleanup of the workers? Would it be enough to have the manager and child-supervisor started under a one-for-all supervisor?
Q: How and where should the code live to terminate the whole structure, eg if we want to implement a “graceful stop” function for the whole subsystem? Should the manager send a message to it’s parent supervisor, which handles the shutdown of everything? Should it instead notify it’s own children directly? (Sure I realise different apps have different needs, I’m more keen to avoid patterns which don’t work well due to races, etc)
Q: How to handle shutdown of the whole app. I think I keep hitting problems where I’m trying to restart stuff as the system is stopping them. This quite possible is due to handling exits (misunderstanding the exit signals). I think during shutdown, things are killed in reverse order? So if I had a supervisor creating a manager process, and a children supervisor, then my children will get killed by the erlang runtime first, but if I’m monitoring them, then how not to restart them?
2 ) Resource cleanup tends to leave me wanting to handle exit signals, which then leads to needing to understand the semantics of those to a very high degree…
Q: If you needed to create a one way link, how would you go about it? eg I have some genserver, which needs another dynamically started genserver that effectively wraps some OS resource, eg I want to monitor an LTE modem, so I request one of the 7 QMI handles that the OS can allocate, it then uses this handle to do some monitoring and send the answer to the parent. If this process dies it needs to handle it’s exit and release the handle. If the parent dies I need to stop the monitoring and release the handle. If the child process dies I would want to restart the process in the parent (as the parent and child need to have knowledge of each other to send messages, etc).
This feel like a need for a one way link? I’m not sure how to model this without just trapping exits, which as your article shows is problematic and easy to get wrong… I did ponder if I couldn’t model this with a dynamic supervisor starting my resource genserver, “monitoring” this from the parent, then setup a separate process a) “linked” to the child and b) monitoring the parent server process… However, this feels ugly and racey to start up.
I can construct similar problems of how to handle a scarce resource which is important to deallocate, and given that anything can be arbitrarily killed without running the terminate function, there isn’t a lot of guidance on how to wrap resources to ensure that they are cleaned up…
I do like Sasa’s “parent” library, it tackles some of these concerns directly. However, your article series is very helpful as it is a bit of an authoritive document on how exit signals function. (caveat that I believe there is this one difference in behaviour for :EXIT messages when the sender pid is given as the parent process, which causes the message to be converted to a “kill”, ie child can’t trap it? The explanation given was this is a beam behaviour, so it happens out of sight of your app)