Strange exception handling by guard clause

emeryotopalik · December 10, 2021, 9:41pm

Hello. I am experiencing an unexpected behavior with the way Elixir is handling a guard clause exception. Take the following:

defmodule Foo do
     def eval(x, percentage) when rem(x, 100) < percentage, do: true
     def eval(_, _), do: false
end

Foo.eval(nil, 20) # Sometimes this is false, sometimes it is true

I have a function very similar to the one above running in my application. Occasionally, I pass x = nil to the function. I would expect there to ALWAYS be an Arithmetic Exception stemming from calling rem(nil, 100) and thus the guard would fail and we would catch the next eval function head.

But, that is not the case. I’ve tried digging into the Elixir source code to decipher how exceptions are handled in guards, but I get a bit turned around. My best guess is that the exception is being compared in the < evaluation itself, and thus the flakiness? I truly do not know.

I do understand that the way to fix this would be to first check is_integer(x). I am merely trying to understand why the exception is being handled as I expect in the first place.

trisolaran · December 10, 2021, 9:57pm

I tried your code. Foo.eval(nil, 20) always returns false to me.

iex(36)> 1..10000 |> Enum.map(fn _ -> Foo.eval(nil, 20) end) |> Enum.uniq
[false]

stefanluptak · December 10, 2021, 9:58pm

Same here:

iex(12)> Enum.all?(1..100000000, fn _i -> Foo.eval(nil, 20) == false end)
true

emeryotopalik · December 10, 2021, 10:08pm

I wonder if this is maybe only reproducible in an application. When I run this code in iex, I agree it always returns false.

dimitarvp · December 10, 2021, 11:29pm

Maybe you’re using an older Elixir version for your app?

al2o3cr · December 10, 2021, 11:44pm

Exceptions aren’t propagated outside the guard, they are considered “failure to match”:

If an arithmetic expression, a Boolean expression, a short-circuit expression, or a call to a guard BIF fails (because of invalid arguments), the entire guard fails. If the guard was part of a guard sequence, the next guard in the sequence (that is, the guard following the next semicolon) is evaluated.

(from the Erlang reference manual)

I’d guess the intent was to avoid having to spam is_integer and similar checks.

rvirding · December 11, 2021, 12:06am

Yes, we felt that it was better to be consistent and have all exceptions in guards just cause the guard to fail. It could otherwise sometimes be very difficult to add tests to avoid all errors in guards.

emeryotopalik · December 11, 2021, 2:42am

Nah, on 1.12

eksperimental · December 11, 2021, 8:24am

I can confirm this issue. As I have managed to reproduce it @emeryotopalik

And it sounds like a serious one. And when it fails, it fails repeatedly for a while until it may go back to normal, and then it may fail again.

I am running Arch Linux 64bits, and my elixir --version is:
OTP 24.1.2

$ elixir --version
Erlang/OTP 24 [erts-12.1.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.12.2 (compiled with Erlang/OTP 24)

mix test fails big time:

$ mix test
Compiling 3 files (.ex)
Compiling lib/bar.ex (it's taking more than 10s)
Generated debug_me app

  1) test Compile time (FooTest)
     test/foo_test.exs:19
     Assertion with == failed
     code:  assert report(Bar.result()) == [false: times]
     left:  [{false, 1970374}, {true, 8029626}]
     right: [false: 10000000]
     stacktrace:
       test/foo_test.exs:22: (test)

.

  2) test Runtime (FooTest)
     test/foo_test.exs:12
     Assertion with == failed
     code:  assert report(result) == [false: times]
     left:  [
              {false, 606},
              {true, 640},
              {false, 588},
              {true, 1264},
              {false, 597},
              {true, 528},
              {false, 194331},
              {true, 2000},
              {false, 1333},
              {true, 1333},
              {false, 666},
              {true, 9796114}
            ]
     right: [false: 10000000]
     stacktrace:
       test/foo_test.exs:16: (test)
..

Finished in 5.1 seconds (0.00s async, 5.1s sync)
1 doctest, 4 tests, 2 failures

Randomized with seed 934406

$ MIX_ENV=test iex -S mix
Erlang/OTP 24 [erts-12.1.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Interactive Elixir (1.12.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> for chunk <- Enum.chunk_by(Bar.result(), & &1), do: {hd(chunk), length(chunk)}
[false: 1970374, true: 8029626]

iex(2)> Enum.map(1..10_000_000, fn _i -> Foo.eval(nil, 20) end) |> Foo.report()       
[false: 10000000]

iex(3)> Enum.map(1..10_000_000, fn _i -> Foo.eval(nil, 20) end) |> Foo.report()
[false: 10000000]

iex(4)> Enum.map(1..10_000_000, fn _i -> Foo.eval(nil, 20) end) |> Foo.report()
[false: 10000000]

Now I run it with OTP 23.3.4.9 , mix testpass all tests,
here are the commands on IEx:

$ iex -S mix
Erlang/OTP 23 [erts-11.2.2.8] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Interactive Elixir (1.12.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> for chunk <- Enum.chunk_by(Bar.result(), & &1), do: {hd(chunk), length(chunk)}
[false: 10000000]

iex(2)> Enum.map(1..10_000_000, fn _i -> Foo.eval(nil, 20) end) |> Foo.report()
[false: 10000000]

…and it’s all good.

So my quick assumption is that this is due the the JIT compiler introduced in OTP24

I have created a project for things like this, where I test an Elixir project in every Elixir/OTP version combination.
and suprisingly it does not fail for OTP 24 with JIT. (it only fails with OTP 17, and it does not even compile, so that is unrelated).

https://github.com/eksperimental/debug_me/runs/4491828158?check_suite_focus=true

the source code of Elixir project can be found here:
GitHub - eksperimental/debug_me at emeryotopalik

UPDATE 1:

And Noticed that when it fails it fails after ~600 results (or multiple of this number), after running MIX_ENV=test mix do compile --force, test multiple times:

these 2 are from the same compilation:

# Compile times:  1998
[{false, 1246}, {true, 1186}, {false, 666}, {true, 1126}, {false, 666}, {true, 5110}]

# Run time: Note that 1998 is 666 * 3
[{false, 1998}, {true, 8002}]
----------------

then I got:

# Runtime (but Compile time did not fail)
[{false, 606}, {true, 999394}]

--------
# Compiletime
 [{false, 665}, {true, 99335}]

---------
# Compile time
[{false, 665}, {true, 99335}]

krstfk · December 11, 2021, 1:11pm

I believe this has to do with this bug :https://github.com/erlang/otp/issues/5401 which has been fixed with https://github.com/erlang/otp/pull/5409 and was part of the erts 12.1.4 (I believe released with Erlang 24.1.5).
I could not reproducer using your repo :

$ iex --version
Erlang/OTP 24 [erts-12.1.5] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit] [dtrace]

IEx 1.12.3 (compiled with Erlang/OTP 24)

$mix test
.....

Finished in 4.2 seconds (0.00s async, 4.2s sync)
1 doctest, 4 tests, 0 failures

emeryotopalik · December 13, 2021, 7:58pm

Updated and it seems to be working consistently for me now! Thanks for the spot.

eksperimental · December 13, 2021, 7:59pm

Updated to what version?
Or you mean downgraded to OTP 23? Note that after the fix, there has been no new OTP releases up to now.

krstfk · December 13, 2021, 8:13pm

The fix has been released with 24.1.4 release notes. Since then 3 minor versions have been released.

eksperimental · December 13, 2021, 8:25pm

It seems like there were two commits to fix related issues.

the one commit that you first mention, is this.

and it is only available in master. Therefore my comment.

the other commit, which is the one that is referenced in the releases notes, and it is probably the commit that fixes the guard issue is

and it has been in the last 3 PATCH releases.
This explains why Github Workflow was not failing.

krstfk · December 13, 2021, 8:34pm

Right, sorry about that I got confused because both mentioned rem

eksperimental · December 13, 2021, 8:39pm

Well, the good bit is that there is already a fix for this and it has been released.