Mnesia crashing on nodeup in clustered configuration

I am running into a problem where mnesia is crashing on nodeup in clustered mode:

Mnesia('vanguard@test02'): mnesia_monitor starting: <0.259.0>
Mnesia('vanguard@test02'): Mnesia debug level set to verbose
Mnesia('vanguard@test02'): mnesia_subscr starting: <0.260.0>
Mnesia('vanguard@test02'): mnesia_locker starting: <0.261.0>
Mnesia('vanguard@test02'): mnesia_recover starting: <0.262.0>
Mnesia('vanguard@test02'): mnesia_tm starting: <0.263.0>
Mnesia('vanguard@test02'): Schema initiated from: default
Mnesia('vanguard@test02'): mnesia_controller starting: <0.265.0>
Mnesia('vanguard@test02'): mnesia_late_loader starting: <0.266.0>
Mnesia('vanguard@test02'): Intend to load tables: []
Mnesia('vanguard@test02'): mnesia_controller terminated: shutdown
Mnesia('vanguard@test02'): mnesia_tm terminated: shutdown
Mnesia('vanguard@test02'): mnesia_recover terminated: shutdown
Mnesia('vanguard@test02'): mnesia_locker terminated: shutdown
Mnesia('vanguard@test02'): mnesia_subscr terminated: shutdown
Mnesia('vanguard@test02'): mnesia_monitor terminated: shutdown
Mnesia('vanguard@test02'): mnesia_monitor starting: <0.280.0>
Mnesia('vanguard@test02'): Mnesia debug level set to verbose
Mnesia('vanguard@test02'): mnesia_subscr starting: <0.281.0>
Mnesia('vanguard@test02'): mnesia_locker starting: <0.282.0>
Mnesia('vanguard@test02'): mnesia_recover starting: <0.283.0>
Mnesia('vanguard@test02'): mnesia_tm starting: <0.284.0>
Mnesia('vanguard@test02'): Schema initiated from: default
Mnesia('vanguard@test02'): mnesia_controller starting: <0.286.0>
Mnesia('vanguard@test02'): mnesia_late_loader starting: <0.287.0>
Mnesia('vanguard@test02'): Intend to load tables: []
Mnesia('vanguard@test02'): Mnesia started, 0 seconds
[info] Application mnesia exited: :stopped
Mnesia('vanguard@test02'): Logging mnesia_up 'vanguard@test01'
Mnesia('vanguard@test02'): Logging mnesia_up 'vanguard@test02'
Mnesia('vanguard@test02'): Creating index for 'Elixir.ProcStore' 3 ordered
Mnesia('vanguard@test02'): Creating index for 'Elixir.HandlerStore' 3 ordered
Mnesia('vanguard@test02'): Creating index for 'Elixir.HandlerStore' 7 ordered
Mnesia('vanguard@test02'): Creating index for 'Elixir.StatusStore' 3 ordered
Mnesia('vanguard@test02'): Creating index for 'Elixir.StatusStore' 4 ordered
Mnesia('vanguard@test02'): Creating index for 'Elixir.CheckStore' 3 ordered
Mnesia('vanguard@test02'): Creating index for 'Elixir.CheckStore' 4 ordered
Mnesia('vanguard@test02'): Creating index for 'Elixir.CheckStore' 10 ordered

If I kill node 1, wait, then bring it back up I get the following:

Mnesia('vanguard@test01'): mnesia_monitor starting: <0.263.0>
Mnesia('vanguard@test01'): Mnesia debug level set to verbose
Mnesia('vanguard@test01'): mnesia_subscr starting: <0.264.0>
Mnesia('vanguard@test01'): mnesia_locker starting: <0.265.0>
Mnesia('vanguard@test01'): mnesia_recover starting: <0.266.0>
Mnesia('vanguard@test01'): mnesia_tm starting: <0.267.0>
Mnesia('vanguard@test01'): Schema initiated from: disc
Mnesia('vanguard@test01'): Initial dump of log during startup: [dumped,
                                                                               dumped]
Mnesia('vanguard@test01'): mnesia_controller starting: <0.290.0>
Mnesia('vanguard@test01'): mnesia_late_loader starting: <0.291.0>
Mnesia('vanguard@test01'): Intend to load tables: [{'Elixir.ProcStore',
                                                                   ram_only}]
Mnesia('vanguard@test01'): Logging mnesia_up 'vanguard@test02'
Mnesia('vanguard@test01'): Logging mnesia_up 'vanguard@test03'
Mnesia('vanguard@test01'): mnesia_controller terminated: shutdown
Mnesia('vanguard@test01'): mnesia_tm terminated: shutdown
Mnesia('vanguard@test01'): mnesia_recover terminated: shutdown
Mnesia('vanguard@test01'): mnesia_locker terminated: shutdown
Mnesia('vanguard@test01'): mnesia_subscr terminated: shutdown
Mnesia('vanguard@test01'): mnesia_monitor terminated: shutdown
Mnesia('vanguard@test01'): mnesia_monitor starting: <0.319.0>
Mnesia('vanguard@test01'): Mnesia debug level set to verbose
Mnesia('vanguard@test01'): mnesia_subscr starting: <0.320.0>
Mnesia('vanguard@test01'): mnesia_locker starting: <0.321.0>
Mnesia('vanguard@test01'): mnesia_recover starting: <0.322.0>
Mnesia('vanguard@test01'): mnesia_tm starting: <0.323.0>
Mnesia('vanguard@test01'): Schema initiated from: default
Mnesia('vanguard@test01'): mnesia_controller starting: <0.325.0>
Mnesia('vanguard@test01'): mnesia_late_loader starting: <0.326.0>
Mnesia('vanguard@test01'): Intend to load tables: []
Mnesia('vanguard@test01'): Mnesia started, 0 seconds
[error] Error in process #PID<0.295.0> on node :"vanguard@test01" with exit value:
{:badarg, [{:ets, :insert, [:mnesia_gvar, {:last_error, {{:aborted, {:node_not_running, :"vanguard@test01"}}, [{:mnesia_loader, :wait_on_load_complete, 1, [file: 'mnesia_loader.erl', line: 358]}, {:mnesia_tm, :apply_fun, 3, [file: 'mnesia_tm.erl', line: 836]}, {:mnesia_tm, :execute_transaction, 5, [file: 'mnesia_tm.erl', line: 811]}, {:mnesia_loader, :init_receiver, 5, [file: 'mnesia_loader.erl', line: 284]}, {:mnesia_loader, :do_get_network_copy, 5, [file: 'mnesia_loader.erl', line: 220]}, {:mnesia_controller, :"-load_table_fun/1-fun-3-", 5, [file: 'mnesia_controller.erl', line: 2180]}, {:mnesia_controller, :"-load_and_reply/2-fun-0-", 2, [file: 'mnesia_controller.erl', line: 2138]}]}}], []}, {:mnesia_lib, :set, 2, [file: 'mnesia_lib.erl', line: 444]}, {:mnesia_lib, :fix_error, 1, [file: 'mnesia_lib.erl', line: 906]}, {:mnesia_tm, :return_abort, 3, [file: 'mnesia_tm.erl', line: 958]}, {:mnesia_loader, :init_receiver, 5, [file: 'mnesia_loader.erl', line: 284]}, {:mnesia_loader, :do_get_network_copy, 5, [file: 'mnesia_loader.erl', line: 220]}, {:mnesia_controller, :"-load_table_fun/1-fun-3-", 5, [file: 'mnesia_controller.erl', line: 2180]}, {:mnesia_controller, :"-load_and_reply/2-fun-0-", 2, [file: 'mnesia_controller.erl', line: 2138]}]}

[error] Error in process #PID<0.294.0> on node :"vanguard@test01" with exit value:
{:badarg, [{:ets, :insert, [:mnesia_gvar, {:last_error, {{:aborted, {:node_not_running, :"vanguard@test01"}}, [{:mnesia_loader, :wait_on_load_complete, 1, [file: 'mnesia_loader.erl', line: 358]}, {:mnesia_tm, :apply_fun, 3, [file: 'mnesia_tm.erl', line: 836]}, {:mnesia_tm, :execute_transaction, 5, [file: 'mnesia_tm.erl', line: 811]}, {:mnesia_loader, :init_receiver, 5, [file: 'mnesia_loader.erl', line: 284]}, {:mnesia_loader, :do_get_network_copy, 5, [file: 'mnesia_loader.erl', line: 220]}, {:mnesia_controller, :"-load_table_fun/1-fun-3-", 5, [file: 'mnesia_controller.erl', line: 2180]}, {:mnesia_controller, :"-load_and_reply/2-fun-0-", 2, [file: 'mnesia_controller.erl', line: 2138]}]}}], []}, {:mnesia_lib, :set, 2, [file: 'mnesia_lib.erl', line: 444]}, {:mnesia_lib, :fix_error, 1, [file: 'mnesia_lib.erl', line: 906]}, {:mnesia_tm, :return_abort, 3, [file: 'mnesia_tm.erl', line: 958]}, {:mnesia_loader, :init_receiver, 5, [file: 'mnesia_loader.erl', line: 284]}, {:mnesia_loader, :do_get_network_copy, 5, [file: 'mnesia_loader.erl', line: 220]}, {:mnesia_controller, :"-load_table_fun/1-fun-3-", 5, [file: 'mnesia_controller.erl', line: 2180]}, {:mnesia_controller, :"-load_and_reply/2-fun-0-", 2, [file: 'mnesia_controller.erl', line: 2138]}]}

[info] Application mnesia exited: :stopped
Mnesia('vanguard@test01'): Logging mnesia_up 'vanguard@test03'
Mnesia('vanguard@test01'): Logging mnesia_up 'vanguard@test02'
Mnesia('vanguard@test01'): Logging mnesia_up 'vanguard@test01'
Mnesia('vanguard@test01'): Creating index for 'Elixir.ProcStore' 3 ordered
Mnesia('vanguard@test01'): Creating index for 'Elixir.CheckStore' 3 ordered
Mnesia('vanguard@test01'): Creating index for 'Elixir.CheckStore' 4 ordered
Mnesia('vanguard@test01'): Creating index for 'Elixir.CheckStore' 10 ordered
Mnesia('vanguard@test01'): Creating index for 'Elixir.HandlerStore' 3 ordered
Mnesia('vanguard@test01'): Creating index for 'Elixir.HandlerStore' 7 ordered
Mnesia('vanguard@test01'): Creating index for 'Elixir.StatusStore' 3 ordered
Mnesia('vanguard@test01'): Creating index for 'Elixir.StatusStore' 4 ordered

I’m not really able to make heads or tails of the situation unfortunately. I am using libcluster, Swarm and Mnesiam for distributed libs. Any and all help/tips/refrences are appreciated.

1 Like

The above was with debug logging enabled. Here is trace:

Can you start fresh ?

From the first set of logs you may have started vanguard@test02 too soon and it was still waiting for vanguard@test1 to start. Or still waiting for the node with the schema defined to start.

The second set of logs were likely caused by killing node 1.

What is you cluster configuration ? Do you have access to the SASL logs ?

@tty I posted a gist with the logs from the following scenario:

  1. Bring node 1 up
  2. wait N minutes
  3. Bring node 2 up

Let me look into how to gather the SASL logs

Read the logs a bit closer and its not due to network connectivity. SASL logs would be helpful.

They all have the same cookie. That’s expected behavior and comes from libcluster, since it’s the first node to come up. I did however notice this on node 2:

Mnesia('vanguard@test02'): write performed by {tid,3,<0.291.0>} on record:
        {schema,schema,
            [{name,schema},
             {type,set},
             {ram_copies,['vanguard@test02']},
             {disc_copies,['vanguard@test01']},
             {disc_only_copies,[]},
             {load_order,0},
             {access_mode,read_write},
             {majority,false},
             {index,[]},
             {snmp,[]},
             {local_content,false},
             {record_name,schema},
             {attributes,[table,cstruct]},
             {user_properties,[]},
             {frag_properties,[]},
             {storage_properties,[]},
             {cookie,
                 {{1532443549373280408,-576460752303423485,1},
                  'vanguard@test01'}},
             {version,
                 {{4,0},
                  {'vanguard@test02',{1532,443782,97859}}}}]}
[info] Application mnesia exited: :stopped

I wonder if this is the cause? All schema/tables should be :disk_copies.

iex(vanguard@test01)4> Node.ping(:"vanguard@test02")
:pong

Likely it is. Tables default to ram_copies if the schema wasn’t distributed as disc_copies over all nodes.

ok so likely it’s somewhere in this fork of mnesiam:
https://github.com/beardedeagle/mnesiam

Ok, I’ve since forked mnesiam here (with the blessing of the original lib author):

And i’ve documented the issue here to work on it as I have time:

2 Likes