Need help w/ a segmentation fault/bus error

Hi there, I’m new to this all, so please bear with me.

I rebuilt a part in our system using Flow with a couple .window_join, meaning 3 flows in total joined together. When I execute my test suite the functionality passes in about 2/3 of running mix test and in about 1/3 it crashes with segmentation fault (exit code 139). Very rarely I also get a crash w/ bus error (exit code 138) and it only once crashed writing an erl_crash.dump.

I tried to strip it down to some minimal code I could share indicating the crash, but since there are a bunch of moving parts involved, I wasn’t lucky so far.

It seems the erl_crash.dump does not contain sensitive data, please correct me, if I’m wrong here. So I could share the file, if this is helpful.

$ elixir -v
Erlang/OTP 22 [erts-10.4.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe] [dtrace]

Elixir 1.9.1 (compiled with Erlang/OTP 22)

What should I do? Thanks!

2 Likes

Was able to boil the code down to 4 lines provoking the crash, see: https://github.com/larskluge/crash

Ran it few times, seems to work fine:

Finished in 0.04 seconds
1 doctest, 1 test, 0 failures

Randomized with seed 511061
elixir -v
Erlang/OTP 21 [erts-10.3.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Elixir 1.9.1 (compiled with Erlang/OTP 21)

You might want to try to downgrade your OTP. I’m on Ubuntu 18.04

1 Like

Very interesting! Thanks for looking into this @yurko!

Will likely continue the conversation here now: https://github.com/plataformatec/flow/issues/89

2 Likes

What do the first 4 lines of erl_crash.dump say? Especially the line starting with “Slogan:”? It should tell you why Erlang crashed.

2 Likes

Sorry, had to hand in my Mac for repair and phone can’t open a text file… however, here is the dump:

https://www.dropbox.com/s/4bec7kevyc795gm/2019-07-30%20erl_crash.dump?dl=0

Thanks!

╰─➤  curl -sL https://www.dropbox.com/s/4bec7kevyc795gm/2019-07-30%20erl_crash.dump?dl=0 | head
=erl_crash_dump:0.5
Tue Jul 30 17:48:21 2019
Slogan: bad header tag 0
System version: Erlang/OTP 22 [erts-10.4.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe] [dtrace]
Compiled: Thu Jul 11 12:57:25 2019
Taints: crypto,asn1rt_nif
Atoms: 27121
Calling Thread: scheduler:5
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | POLL_SLEEPING | TSE_SLEEPING | WAITING
1 Like

The question is of course what the “bad header tag” means as a reason for the erlang to crash? :wink:

I asked in the erlang slack and got the reply that it comes from a corrupted map. I am guessing, GUESSING mind you, that it may come from a NIF which constructs a badly formed map.

3 Likes

It could be a NIF, but corrupted map sounds like an issue I heard in OTP recently that has a PR fixing it, but I thought that was in 22, not 21…

1 Like

yes, 21 is working, the crush was reproducible in 22 and then went to https://github.com/plataformatec/flow/issues/89 and https://bugs.erlang.org/browse/ERL-1017

Sorry, a bit late here. I’m back online now. However, @yurko already linked all important resources. The issue is already fixed and will go out with the next Erlang patch release. Thank you all for your support!

3 Likes