Given this shitty application:
children = [
%{start: {Process, :sleep, [1_000_000]}, id: 1},
%{start: {Process, :sleep, [1_000_000]}, id: 2}
]
Supervisor.init(children, [strategy: :one_for_one])
It obviously takes 2000 seconds for the supervisor to launch (ignoring that Process.sleep/1 returns something that it doesn’t expect).
How would I know on which child it’s currently working on without changing any code?
Background: in a larger application, I’m debugging a rare case where one of our nodes fails to fully start the application supervision tree, but I’m unsure where exactly it’s stuck. There must be some start_link function that’s just an endless loop.
I can connect to it from another node in the cluster and ask questions via :erpc
, such as the current stack trace, but that’s mostly useless:
:erpc.call(other, fn -> Process.info(Process.whereis(MyApp.Supervisor), :current_stacktrace) end)
{:current_stacktrace,
[
{:proc_lib, :sync_start_link, 2, [file: 'proc_lib.erl', line: 351]},
{:supervisor, :do_start_child_i, 3, [file: 'supervisor.erl', line: 414]},
{:supervisor, :do_start_child, 2, [file: 'supervisor.erl', line: 400]},
{:supervisor, :"-start_children/2-fun-0-", 3,
[file: 'supervisor.erl', line: 384]},
{:supervisor, :children_map, 4, [file: 'supervisor.erl', line: 1250]},
{:supervisor, :init_children, 2, [file: 'supervisor.erl', line: 350]},
{:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 851]},
{:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 814]},
{:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}
]}