Exavier - Mutation Testing library for Elixir

I created this lib to learn more about code compilation in Elixir, about ex_unit and also as an excuse to experiment working with the AST in Elixir.

It’s still very much a PoC, but I’d be happy to discuss about it. I think this can be useful to run as part of your CI pipeline if we get it to a good enough state, which is probably not where it’s at right now.

The work is inspired by mutant and pitest, but obviously less powerful ATM. Good thing is we can get there!

I have blogged about exavier here. The library GitHub repo with other info and ways of contributing (if you find it mildly interesting) is available here. Lots of good simple additions are in the works.

Thanks. :blush:

13 Likes

This is very interesting, and I’ve been thinking of doing the same. From a cursory reading of the code, it looks like you mutate all operators in the file at the same time instead of one by one. Is that so? Shouldn’t one mutate one operator at the time to get more precise results?

To reduce the need to recompile a module lots of times, I’ve also tried to do it in a different way by mutating the Erlang AST in a way that allowed me to toggle mutations on and off without recompiling the code (see here: https://nwolverson.uk/devlog/2016/08/01/introducing-purescript-erlang.html), but I couldn’t get very good error messages on my first try. It also made it harder to generate relevant mutations (and it made it impossible to generate mutations on macros)

3 Likes

Hey there @tmbb! Thanks so much for engaging. It means a lot.

To reduce the need to recompile a module lots of times, I’ve also tried to do it in a different way by mutating the Erlang AST in a way that allowed me to toggle mutations on and off without recompiling the code

That’s very smart! I might give that a go if I find it can speed up mutation testing significantly, which I’m thinking it will. Great suggestion!

From a cursory reading of the code, it looks like you mutate all operators in the file at the same time instead of one by one. Is that so? Shouldn’t one mutate one operator at the time to get more precise results?

You’re absolutely right, I do mutate all in one go. That by itself can be seen as a limitation. It should be fairly easy to change it though. I’ve thought about that, and AFAIK mutation testing doesn’t say anything about the amount of mutations each mutant should have, but you’re right in that what I’m doing might make the output more verbose, and it might even impair the understanding of the change needed. I’ll definitely consider this change.


Some more context regarding this second question and answer:

This approach of mutating all in one go was a trade-off I felt I could get away with for testing the feasibility of this PoC.

See, I have this problem, which is right now I’m not running each test ... do individually but instead I’m running the whole test module (e.g., HelloWorldTest). This has one clear disadvantage, which I’ll explain below with an example:

defmodule HelloWorld do
  def sum(a, b) do: a + b
  def divide(a, b), do: div(a, b)
end
defmodule HelloWorldTest do
  test "when testing sum" do
    assert HelloWorld.sum(3, 0) == 3
  end

  test "when testing divide" do
    assert HelloWorld.divide(5, 2) == 3
  end
end

If I change code to the following:

defmodule HelloWorld do
  def sum(a, b) do: a - b # changed from + to - via AOR1
  def divide(a, b), do: div(a, b)
end

I will be running the tests for both tests, instead of just running the test for sum/2 (i.e., "when testing sum", which was the only one for which the corresponding source code changed). In order to try and maximise the amount of mutations I can catch with running the entire test module, I mutate all in one go. Does that make sense? Maybe it doesn’t… :man_facepalming: AFAIU, finding out what tests I should run per source code change is hard. But I might not be seeing something very obvious. Let me know.

If you have some ideas on how to improve this aspect of exavier, if you have a good heuristic or alternative, let me know as well @tmbb. Again, thank you for your kind comment. I also appreciate you challenging my design. :star2:

Let’s make it better together! :heart:

When I tried to do it it didn’t seem as easy as you might think. I tried to traverse the AST while keeping a counter, so that I knew which operator to mutate, but I didn’t manage to make it work. I must have been doing something wrong.

Yes, that’s the main reason to mutate one at a time.

Recompiling your modules will in general be much slower than running the tests, so I think you’re optimizing for the right thing (i.e., reducing the need to recompile code, at the cost of possibly running more tests)

Because of what I said above, I don’t think you should even try to guess which tests to run…

1 Like

This is similar to what pitest does (they manipulate the java bytecode), but it looks like their job is much easier because the mapping between the source and the bytecode is much simpler than between Elixir and the Erlang source code (Elixir expands lots of macros, which obfuscate the relationship between the elixir source and the erlang AST).

However, it might be possible to recognize the (posibly macroexpanded) Elixir operators in the erlang source and mutate that. I have to look a little deeper. Another problem with this approach is that I don’t know how to mutate operators inside guard clauses (I can’ simply add arbitrary functions there…)

I have now tested this in practice and I believe it’s not true (although I’m compiling Erlang code, not Elixir code). The bottleneck might be actually running the tests (it depends on your test suite, of course)… And because I’m instrumenting the code to be able to switch mutations without compiling the code, my tests become slower. Running the test suite for the unmutated Enum module takes about 1.7s on my machine. With my mutated code it takes ~3s (almost twice as slow!).

However, if I can fail the whole suite as soon as a test fails, then I might be able to kill a mutant in miliseconds, and in that case avoiding the recompilation of Elixir code might be worth it… I don’t know, but I’m not so certain I should proceed with Darwin’s approach instead of yours.

Are you already trying one mutation at a time instead of all at once?

Hey @tmbb. No, I haven’t tried “one mutation at a time instead of all at once”. I’m now focusing on trying to have test coverage by individual test instead of by each whole module. This would allow me to evaluate mutation coverage in a more fine-grained way and have more realistic coverage (not as low, in practice), for each of the modifications I make.

Currently, I run an (all at one) mutation on a module, run the whole test module on in and check failures. If some test passes, then I flag it as a survived mutant (bad). But sometimes, I may be changing only function X and not function Y, but testing functions X and Y as part of the same test module.

If I can have finer-grained test coverage, I could understand only function Y is covered by test Y, and only try and modify that on the mutation run. Same goes for X. Better, more accurate coverage would come out of this. But it’s not trivial to do without being a bit hacky. I’ll try and get there. I’ll keep on posting any relevant developments here. Right now I’m not even pushing changes to my remote, since it’s all very experimental at this stage.

Stay tuned. :wink:

I’ve just found out that recompiling modules lots of times (~ hundreds of times), even if the modules don’t change consumes memory from the BEAM’s literal allocator. This memory doesn’t seem to be garbage collected and it will fill up and cause errors. If you decide to generate mutants one at a time and each mutant requires a module recompilation, you might hit these errors…

I’m just warning you because I found out about this problem the hard way (i.e., when I tried to recompile my test suite about 300 times for 300 mutations in an elixir module). Fortunately, the way I’m generating mutants doesn’t require recompiling the code for each mutant, so I could work around it.

1 Like

Lol, that’s awesome. ^.^

I wonder what is leaking, if you could reduce it to a simple test-case (recompilation loop?) and figure out if Elixir or OTP issue then a bug report can be submitted? I know they both store information out of band, wonder if it is ever cleaned up…

I’ll look into it. IMO, if we are recompiling the same module, we shouldn’t consume new memory from the literal allocator. And I did try to clean up the code as much as I could. I’m not as motivated to get at the bottom of this, though, because my approach no longer depends on recompiling modules…

1 Like

Hi,

First, thanks for creating this library, I’m a personal fan of mutation tests and got really glad by finding exavier.

I’m actually having an issue running the mutation tests with the default timeout for mutate_module. Here is the stacktrace:

13:46:08.242 [error] GenServer Exavier.Server terminating
** (stop) exited in: Task.Supervised.stream(5000)
    ** (EXIT) time out
    (elixir 1.11.2) lib/task/supervised.ex:304: Task.Supervised.stream_reduce/7
    (elixir 1.11.2) lib/enum.ex:3461: Enum.reverse/1
    (elixir 1.11.2) lib/enum.ex:3054: Enum.to_list/1
    (exavier 0.3.0) lib/exavier/server.ex:59: Exavier.Server.handle_call/3
    (stdlib 3.13.2) gen_server.erl:706: :gen_server.try_handle_call/4
    (stdlib 3.13.2) gen_server.erl:735: :gen_server.handle_msg/6
    (stdlib 3.13.2) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.94.0>): :xmen

Is there a way to configure a longer timeout?

Thanks a lot in advance

Hey Luiz, thanks for your interest in exavier. For now I think you should be able to overcome that by setting EXAVIER_DEBUG=1.

MRs welcome to allow setting custom timeouts for each particular time-bound work.

1 Like

Thanks a lot. I’m still new to elixir and was not able to find this debug option.
And certainly am going to try that MR and/or collaborate with anything I can.

Hey guys,

I’m just having another issue with the library. Can you give me a hand? For 99% of my test modules, I receive this message:

10:38:11.461 [error] Could not find module  defined in option :test_files_to_modules for test/gateway_web/controllers/participant_controller_test.exs.
10:38:11.461 [error] GenServer Exavier.Server terminating
** (MatchError) no match of right hand side value: :ok
    (exavier 0.3.0) lib/exavier/cover.ex:8: Exavier.Cover.lines_to_mutate/2
    (exavier 0.3.0) lib/exavier/server.ex:20: anonymous fn/2 in Exavier.Server.handle_call/3
    (elixir 1.11.2) lib/enum.ex:2181: Enum."-reduce/3-lists^foldl/2-0-"/3
    (exavier 0.3.0) lib/exavier/server.ex:18: Exavier.Server.handle_call/3
    (stdlib 3.14) gen_server.erl:715: :gen_server.try_handle_call/4
    (stdlib 3.14) gen_server.erl:744: :gen_server.handle_msg/6
    (stdlib 3.14) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.94.0>): :xmen1

I have already tried reference them in the .exavier file like this:

%{
 test_files_to_modules: %{
   "test/address_key_worker_test.exs" => Gateway.AddressKeyWorkerTest,

But it doesnt work. It does not recognize my test modules and result in a similar error (and even if it worked, there would be a large work to put all test modules in the file and to educate the team to do so).
Do you know what can be done?

Thanks in forward.

Hey Luiz,

test_files_to_modules should have test file paths as keys and the actual module that test file is testing as value (i.e., not the test file module as you did). An example is in the exavier repo’s self-mutation testing as example here:

  test_files_to_modules: %{
    "test/exavier/mutators/aor1_test.exs" => Exavier.Mutators.AOR1,
    ...

Hi @dnlserrano, sorry for not responding in a reasonable time.
But then I have to create, for each new test module, a new line in this configuration file? Is there a way for exavier to detect automatically these modules? (New idea for my future MR maybe? hehe)

Thanks for all the support.