I created a tiny library which supervises “singleton” processes, using Erlang’s global module.
Processes which are started with Singleton.start_child/3 are guaranteed to run on one single node in your Elixir cluster. The nodes that don’t run the process, monitor it so that when it dies, it gets respawned, possibly on another node. This monitoring part is something that global doesn’t do for you.
And one of the nodes basically is left in a state where my own application is not running, only the “libraries” continue running.
I’ve been struggling with the same issue in my own implementation.
I created a DynamicSupervisor for the singleton process, similarly to you and I was hoping that it will “shield” the main Application supervisor from crashing when the :global registry kills one of the singleton processes.
However, that doesn’t seem to be the case and the whole application dies regardless of multiple layers of supervisors
What is restart set to in the “singleton” GenServer? If it’s set to the default :permanent, then the supervisor on the joined node will keep restarting the process over and over again until it also gives up, cascading down the chain of supervisors until the whole application stops.
Does the behavior change if you specify use GenServer, restart: :temporary?