I guess @Ciboulette’s point is that he was testing the fault tolerance on every part of his system, and it’s very odd that killing a Registry
shuts down the whole application. He’s not doing that on his system, only trying out possible failure points.
Faults can happen anywhere, and being fault tolerant is a basic property of BEAM, that’s why I’m very intrigued by this.
So I have managed to repeat the problem and traced it out:
15:48:56:912029 (<0.132.0>) getting_unlinked <0.133.0>
15:48:56:912032 (<0.132.0>) << {'EXIT',<0.133.0>,killed}
15:48:56:912207 (<0.132.0>) spawn <0.552.0> as proc_lib:init_p('Elixir.Toto.Supervisor',[<0.131.0>],gen,init_it,[gen_server,<0.132.0>,<0.132.0>,
{local,'Elixir.Worker.ProcessRegistry'},
supervisor,
{{local,'Elixir.Worker.ProcessRegistry'},
'Elixir.Registry.Supervisor',
{unique,'Elixir.Worker.ProcessRegistry',1,[],[{-1,{unique,1,nil,nil,[]}},{-2,{unique,1,nil}}]}},
[]])
15:48:56:912226 (<0.132.0>) link <0.552.0>
15:48:56:912233 (<0.132.0>) out {proc_lib,sync_wait,2}
15:48:56:912724 (<0.132.0>) in {proc_lib,sync_wait,2}
15:48:56:912731 (<0.132.0>) << {ack,<0.552.0>,
{error,
{shutdown,
{failed_to_start_child,
'Elixir.Worker.ProcessRegistry.PIDPartition0',
{already_started,<0.134.0>}}}}}
15:48:56:912737 (<0.132.0>) getting_unlinked <0.552.0>
15:48:56:912738 (<0.132.0>) << {'EXIT',<0.552.0>,
{shutdown,
{failed_to_start_child,
'Elixir.Worker.ProcessRegistry.PIDPartition0',
{already_started,<0.134.0>}}}}
15:48:56:912762 (<0.132.0>) <0.132.0> ! {'$gen_cast',
{try_again_restart,'Elixir.Worker.ProcessRegistry'}}
15:48:56:912769 (<0.132.0>) << {'$gen_cast',{try_again_restart,'Elixir.Worker.ProcessRegistry'}}
15:48:56:912788 (<0.132.0>) spawn <0.553.0> as proc_lib:init_p('Elixir.Toto.Supervisor',[<0.131.0>],gen,init_it,[gen_server,<0.132.0>,<0.132.0>,
{local,'Elixir.Worker.ProcessRegistry'},
supervisor,
{{local,'Elixir.Worker.ProcessRegistry'},
'Elixir.Registry.Supervisor',
{unique,'Elixir.Worker.ProcessRegistry',1,[],[{-1,{unique,1,nil,nil,[]}},{-2,{unique,1,nil}}]}},
[]])
15:48:56:912793 (<0.132.0>) link <0.553.0>
15:48:56:912797 (<0.132.0>) out {proc_lib,sync_wait,2}
15:48:56:913611 (<0.132.0>) in {proc_lib,sync_wait,2}
15:48:56:913622 (<0.132.0>) << {ack,<0.553.0>,
{error,
{shutdown,
{failed_to_start_child,
'Elixir.Worker.ProcessRegistry.PIDPartition0',
{already_started,<0.134.0>}}}}}
15:48:56:913632 (<0.132.0>) getting_unlinked <0.553.0>
15:48:56:913635 (<0.132.0>) << {'EXIT',<0.553.0>,
{shutdown,
{failed_to_start_child,
'Elixir.Worker.ProcessRegistry.PIDPartition0',
{already_started,<0.134.0>}}}}
15:48:56:913692 (<0.132.0>) <0.132.0> ! {'$gen_cast',
{try_again_restart,'Elixir.Worker.ProcessRegistry'}}
15:48:56:913702 (<0.132.0>) << {'$gen_cast',{try_again_restart,'Elixir.Worker.ProcessRegistry'}}
15:48:56:913733 (<0.132.0>) spawn <0.554.0> as proc_lib:init_p('Elixir.Toto.Supervisor',[<0.131.0>],gen,init_it,[gen_server,<0.132.0>,<0.132.0>,
{local,'Elixir.Worker.ProcessRegistry'},
supervisor,
{{local,'Elixir.Worker.ProcessRegistry'},
'Elixir.Registry.Supervisor',
{unique,'Elixir.Worker.ProcessRegistry',1,[],[{-1,{unique,1,nil,nil,[]}},{-2,{unique,1,nil}}]}},
[]])
15:48:56:913741 (<0.132.0>) link <0.554.0>
15:48:56:913748 (<0.132.0>) out {proc_lib,sync_wait,2}
15:48:56:914126 (<0.132.0>) in {proc_lib,sync_wait,2}
15:48:56:914195 (<0.132.0>) << {ack,<0.554.0>,
{error,
{shutdown,
{failed_to_start_child,
'Elixir.Worker.ProcessRegistry.PIDPartition0',
{already_started,<0.134.0>}}}}}
15:48:56:914216 (<0.132.0>) getting_unlinked <0.554.0>
15:48:56:914218 (<0.132.0>) << {'EXIT',<0.554.0>,
{shutdown,
{failed_to_start_child,
'Elixir.Worker.ProcessRegistry.PIDPartition0',
{already_started,<0.134.0>}}}}
15:48:56:914274 (<0.132.0>) <0.132.0> ! {'$gen_cast',
{try_again_restart,'Elixir.Worker.ProcessRegistry'}}
15:48:56:914288 (<0.132.0>) << {'$gen_cast',{try_again_restart,'Elixir.Worker.ProcessRegistry'}}
15:48:56:914329 (<0.132.0>) exit shutdown
15:48:56:914334 (<0.132.0>) unregister 'Elixir.Toto.Supervisor'
15:48:56:914337 (<0.132.0>) out_exited 0
I’m not so experient tracing, so any help would be awesome!