Strange exception handling by guard clause

Hello. I am experiencing an unexpected behavior with the way Elixir is handling a guard clause exception. Take the following:

defmodule Foo do
     def eval(x, percentage) when rem(x, 100) < percentage, do: true
     def eval(_, _), do: false
end

Foo.eval(nil, 20) # Sometimes this is false, sometimes it is true

I have a function very similar to the one above running in my application. Occasionally, I pass x = nil to the function. I would expect there to ALWAYS be an Arithmetic Exception stemming from calling rem(nil, 100) and thus the guard would fail and we would catch the next eval function head.

But, that is not the case. I’ve tried digging into the Elixir source code to decipher how exceptions are handled in guards, but I get a bit turned around. My best guess is that the exception is being compared in the < evaluation itself, and thus the flakiness? I truly do not know.

I do understand that the way to fix this would be to first check is_integer(x). I am merely trying to understand why the exception is being handled as I expect in the first place.

2 Likes

I tried your code. Foo.eval(nil, 20) always returns false to me.

iex(36)> 1..10000 |> Enum.map(fn _ -> Foo.eval(nil, 20) end) |> Enum.uniq
[false]
1 Like

Same here:

iex(12)> Enum.all?(1..100000000, fn _i -> Foo.eval(nil, 20) == false end)
true
1 Like

I wonder if this is maybe only reproducible in an application. When I run this code in iex, I agree it always returns false.

1 Like

Maybe you’re using an older Elixir version for your app?

Exceptions aren’t propagated outside the guard, they are considered “failure to match”:

If an arithmetic expression, a Boolean expression, a short-circuit expression, or a call to a guard BIF fails (because of invalid arguments), the entire guard fails. If the guard was part of a guard sequence, the next guard in the sequence (that is, the guard following the next semicolon) is evaluated.

(from the Erlang reference manual)

I’d guess the intent was to avoid having to spam is_integer and similar checks.

2 Likes

Yes, we felt that it was better to be consistent and have all exceptions in guards just cause the guard to fail. It could otherwise sometimes be very difficult to add tests to avoid all errors in guards.

Nah, on 1.12

I can confirm this issue. As I have managed to reproduce it @emeryotopalik

And it sounds like a serious one. And when it fails, it fails repeatedly for a while until it may go back to normal, and then it may fail again.

I am running Arch Linux 64bits, and my elixir --version is:
OTP 24.1.2

$ elixir --version
Erlang/OTP 24 [erts-12.1.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.12.2 (compiled with Erlang/OTP 24)

mix test fails big time:

$ mix test
Compiling 3 files (.ex)
Compiling lib/bar.ex (it's taking more than 10s)
Generated debug_me app

  1) test Compile time (FooTest)
     test/foo_test.exs:19
     Assertion with == failed
     code:  assert report(Bar.result()) == [false: times]
     left:  [{false, 1970374}, {true, 8029626}]
     right: [false: 10000000]
     stacktrace:
       test/foo_test.exs:22: (test)

.

  2) test Runtime (FooTest)
     test/foo_test.exs:12
     Assertion with == failed
     code:  assert report(result) == [false: times]
     left:  [
              {false, 606},
              {true, 640},
              {false, 588},
              {true, 1264},
              {false, 597},
              {true, 528},
              {false, 194331},
              {true, 2000},
              {false, 1333},
              {true, 1333},
              {false, 666},
              {true, 9796114}
            ]
     right: [false: 10000000]
     stacktrace:
       test/foo_test.exs:16: (test)
..

Finished in 5.1 seconds (0.00s async, 5.1s sync)
1 doctest, 4 tests, 2 failures

Randomized with seed 934406
$ MIX_ENV=test iex -S mix
Erlang/OTP 24 [erts-12.1.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Interactive Elixir (1.12.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> for chunk <- Enum.chunk_by(Bar.result(), & &1), do: {hd(chunk), length(chunk)}
[false: 1970374, true: 8029626]

iex(2)> Enum.map(1..10_000_000, fn _i -> Foo.eval(nil, 20) end) |> Foo.report()       
[false: 10000000]

iex(3)> Enum.map(1..10_000_000, fn _i -> Foo.eval(nil, 20) end) |> Foo.report()
[false: 10000000]

iex(4)> Enum.map(1..10_000_000, fn _i -> Foo.eval(nil, 20) end) |> Foo.report()
[false: 10000000]

Now I run it with OTP 23.3.4.9 , mix testpass all tests,
here are the commands on IEx:

$ iex -S mix
Erlang/OTP 23 [erts-11.2.2.8] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Interactive Elixir (1.12.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> for chunk <- Enum.chunk_by(Bar.result(), & &1), do: {hd(chunk), length(chunk)}
[false: 10000000]

iex(2)> Enum.map(1..10_000_000, fn _i -> Foo.eval(nil, 20) end) |> Foo.report()
[false: 10000000]

…and it’s all good.

So my quick assumption is that this is due the the JIT compiler introduced in OTP24

I have created a project for things like this, where I test an Elixir project in every Elixir/OTP version combination.
and suprisingly it does not fail for OTP 24 with JIT. (it only fails with OTP 17, and it does not even compile, so that is unrelated).

Run elixir --version in every Workflow test · eksperimental/debug_me@e9bdbea · GitHub

the source code of Elixir project can be found here:
GitHub - eksperimental/debug_me at emeryotopalik


UPDATE 1:

And Noticed that when it fails it fails after ~600 results (or multiple of this number), after running MIX_ENV=test mix do compile --force, test multiple times:

these 2 are from the same compilation:

# Compile times:  1998
[{false, 1246}, {true, 1186}, {false, 666}, {true, 1126}, {false, 666}, {true, 5110}]

# Run time: Note that 1998 is 666 * 3
[{false, 1998}, {true, 8002}]
----------------

then I got:

# Runtime (but Compile time did not fail)
[{false, 606}, {true, 999394}]

--------
# Compiletime
 [{false, 665}, {true, 99335}]

---------
# Compile time
[{false, 665}, {true, 99335}]
4 Likes

I believe this has to do with this bug :https://github.com/erlang/otp/issues/5401 which has been fixed with https://github.com/erlang/otp/pull/5409 and was part of the erts 12.1.4 (I believe released with Erlang 24.1.5).
I could not reproducer using your repo :

$ iex --version
Erlang/OTP 24 [erts-12.1.5] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit] [dtrace]

IEx 1.12.3 (compiled with Erlang/OTP 24)

$mix test
.....

Finished in 4.2 seconds (0.00s async, 4.2s sync)
1 doctest, 4 tests, 0 failures

5 Likes

Updated and it seems to be working consistently for me now! Thanks for the spot.

Updated to what version?
Or you mean downgraded to OTP 23? Note that after the fix, there has been no new OTP releases up to now.

The fix has been released with 24.1.4 release notes. Since then 3 minor versions have been released.

1 Like

It seems like there were two commits to fix related issues.

the one commit that you first mention, is this.

and it is only available in master. Therefore my comment.

the other commit, which is the one that is referenced in the releases notes, and it is probably the commit that fixes the guard issue is

and it has been in the last 3 PATCH releases.
This explains why Github Workflow was not failing.

1 Like

Right, sorry about that I got confused because both mentioned rem

1 Like

Well, the good bit is that there is already a fix for this :grinning: and it has been released.