Hey everyone,
ever since the introduction of profiling into Benchee the test suite has experienced flakiness.
We basically call the built-in tasks like this Eprof task in Elixir.
Now, with 40 build configuration running, quite often we get a failure like this one:
1) test hooks all kinds of hooks with inputs work (Benchee.ProfileTest)
Error: test/benchee/profile_test.exs:164
** (exit) exited in: :gen_server.call(:eprof, {:profile_start, [], {:_, :_, :_}, {:erlang, :apply, [#Function<6.45063716/0 in Benchee.Benchmark.Runner.main_function/2>, []]}, [set_on_spawn: true]}, :infinity)
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
code: retrying(fn ->
stacktrace:
(stdlib) gen_server.erl:223: :gen_server.call/3
Warning: (mix) lib/mix/tasks/profile.eprof.ex:197: Mix.Tasks.Profile.Eprof.profile_and_analyse/2
(mix) lib/mix/tasks/profile.eprof.ex:186: Mix.Tasks.Profile.Eprof.profile/2
Warning: (benchee) lib/benchee/benchmark/collect/profile.ex:17: Benchee.Benchmark.Collect.Profile.collect/2
(benchee) lib/benchee/benchmark/runner.ex:291: Benchee.Benchmark.Runner.collect/3
Warning: (benchee) lib/benchee/benchmark/run_once.ex:15: Benchee.Benchmark.RunOnce.run/3
(elixir) lib/enum.ex:783: Enum."-each/2-lists^foreach/1-0-"/2
Warning: (elixir) lib/enum.ex:783: Enum.each/2
(benchee) lib/benchee/profile.ex:59: Benchee.Profile.profile/2
Warning: (ex_unit) lib/ex_unit/capture_io.ex:151: ExUnit.CaptureIO.do_capture_io/2
(ex_unit) lib/ex_unit/capture_io.ex:121: ExUnit.CaptureIO.do_capture_io/3
Warning: test/benchee/profile_test.exs:166: anonymous fn/0 in Benchee.ProfileTest."test hooks all kinds of hooks with inputs work"/1
(benchee) test/support/test_helpers.ex:17: Benchee.TestHelpers.retrying/2
Warning: test/benchee/profile_test.exs:165: (test)
Which is weird, as the elixir code explicitly starts eprof
(yes even in 1.9.4, I double checked, those are lines 196-197):
:eprof.start()
:eprof.profile([], fun, Keyword.get(opts, :matching, {:_, :_, :_}))
Some thoughts:
- Now you could assume cross test taint but the test file under question runs with
async: false
. - The return values of
:eprof.start()
are also basically “started correctly” or “I was already started, here is my pid” so I don’t think checking these would help Erlang -- eprof - It seems to happen mostly with eprof, but that might be because eprof is our default profiler.
- rerunning a failed run with the same seed also works (I was worried that a specific order of tests might crash eprof or something but since we start it again
)
Basically I’m at wits end. I now started retrying these tests (finally correctly ) but that also doesn’t feel right. I hope I’m just missing something stupid, which I’d be happy for y’all to tell me about
If anyone had any idea why these keep failing I’d be happy to hear them and fix it for real
Thanks!
Tobi
PS: cc @pablocostass if you have any idea