Exavier - Mutation Testing library for Elixir

I created this lib to learn more about code compilation in Elixir, about ex_unit and also as an excuse to experiment working with the AST in Elixir.

It’s still very much a PoC, but I’d be happy to discuss about it. I think this can be useful to run as part of your CI pipeline if we get it to a good enough state, which is probably not where it’s at right now.

The work is inspired by mutant and pitest, but obviously less powerful ATM. Good thing is we can get there!

I have blogged about exavier here. The library GitHub repo with other info and ways of contributing (if you find it mildly interesting) is available here. Lots of good simple additions are in the works.

Thanks. :blush:


This is very interesting, and I’ve been thinking of doing the same. From a cursory reading of the code, it looks like you mutate all operators in the file at the same time instead of one by one. Is that so? Shouldn’t one mutate one operator at the time to get more precise results?

To reduce the need to recompile a module lots of times, I’ve also tried to do it in a different way by mutating the Erlang AST in a way that allowed me to toggle mutations on and off without recompiling the code (see here: https://nwolverson.uk/devlog/2016/08/01/introducing-purescript-erlang.html), but I couldn’t get very good error messages on my first try. It also made it harder to generate relevant mutations (and it made it impossible to generate mutations on macros)


Hey there @tmbb! Thanks so much for engaging. It means a lot.

To reduce the need to recompile a module lots of times, I’ve also tried to do it in a different way by mutating the Erlang AST in a way that allowed me to toggle mutations on and off without recompiling the code

That’s very smart! I might give that a go if I find it can speed up mutation testing significantly, which I’m thinking it will. Great suggestion!

From a cursory reading of the code, it looks like you mutate all operators in the file at the same time instead of one by one. Is that so? Shouldn’t one mutate one operator at the time to get more precise results?

You’re absolutely right, I do mutate all in one go. That by itself can be seen as a limitation. It should be fairly easy to change it though. I’ve thought about that, and AFAIK mutation testing doesn’t say anything about the amount of mutations each mutant should have, but you’re right in that what I’m doing might make the output more verbose, and it might even impair the understanding of the change needed. I’ll definitely consider this change.

Some more context regarding this second question and answer:

This approach of mutating all in one go was a trade-off I felt I could get away with for testing the feasibility of this PoC.

See, I have this problem, which is right now I’m not running each test ... do individually but instead I’m running the whole test module (e.g., HelloWorldTest). This has one clear disadvantage, which I’ll explain below with an example:

defmodule HelloWorld do
  def sum(a, b) do: a + b
  def divide(a, b), do: div(a, b)
defmodule HelloWorldTest do
  test "when testing sum" do
    assert HelloWorld.sum(3, 0) == 3

  test "when testing divide" do
    assert HelloWorld.divide(5, 2) == 3

If I change code to the following:

defmodule HelloWorld do
  def sum(a, b) do: a - b # changed from + to - via AOR1
  def divide(a, b), do: div(a, b)

I will be running the tests for both tests, instead of just running the test for sum/2 (i.e., "when testing sum", which was the only one for which the corresponding source code changed). In order to try and maximise the amount of mutations I can catch with running the entire test module, I mutate all in one go. Does that make sense? Maybe it doesn’t… :man_facepalming: AFAIU, finding out what tests I should run per source code change is hard. But I might not be seeing something very obvious. Let me know.

If you have some ideas on how to improve this aspect of exavier, if you have a good heuristic or alternative, let me know as well @tmbb. Again, thank you for your kind comment. I also appreciate you challenging my design. :star2:

Let’s make it better together! :heart:

When I tried to do it it didn’t seem as easy as you might think. I tried to traverse the AST while keeping a counter, so that I knew which operator to mutate, but I didn’t manage to make it work. I must have been doing something wrong.

Yes, that’s the main reason to mutate one at a time.

Recompiling your modules will in general be much slower than running the tests, so I think you’re optimizing for the right thing (i.e., reducing the need to recompile code, at the cost of possibly running more tests)

Because of what I said above, I don’t think you should even try to guess which tests to run…

1 Like

This is similar to what pitest does (they manipulate the java bytecode), but it looks like their job is much easier because the mapping between the source and the bytecode is much simpler than between Elixir and the Erlang source code (Elixir expands lots of macros, which obfuscate the relationship between the elixir source and the erlang AST).

However, it might be possible to recognize the (posibly macroexpanded) Elixir operators in the erlang source and mutate that. I have to look a little deeper. Another problem with this approach is that I don’t know how to mutate operators inside guard clauses (I can’ simply add arbitrary functions there…)

I have now tested this in practice and I believe it’s not true (although I’m compiling Erlang code, not Elixir code). The bottleneck might be actually running the tests (it depends on your test suite, of course)… And because I’m instrumenting the code to be able to switch mutations without compiling the code, my tests become slower. Running the test suite for the unmutated Enum module takes about 1.7s on my machine. With my mutated code it takes ~3s (almost twice as slow!).

However, if I can fail the whole suite as soon as a test fails, then I might be able to kill a mutant in miliseconds, and in that case avoiding the recompilation of Elixir code might be worth it… I don’t know, but I’m not so certain I should proceed with Darwin’s approach instead of yours.

Are you already trying one mutation at a time instead of all at once?

Hey @tmbb. No, I haven’t tried “one mutation at a time instead of all at once”. I’m now focusing on trying to have test coverage by individual test instead of by each whole module. This would allow me to evaluate mutation coverage in a more fine-grained way and have more realistic coverage (not as low, in practice), for each of the modifications I make.

Currently, I run an (all at one) mutation on a module, run the whole test module on in and check failures. If some test passes, then I flag it as a survived mutant (bad). But sometimes, I may be changing only function X and not function Y, but testing functions X and Y as part of the same test module.

If I can have finer-grained test coverage, I could understand only function Y is covered by test Y, and only try and modify that on the mutation run. Same goes for X. Better, more accurate coverage would come out of this. But it’s not trivial to do without being a bit hacky. I’ll try and get there. I’ll keep on posting any relevant developments here. Right now I’m not even pushing changes to my remote, since it’s all very experimental at this stage.

Stay tuned. :wink:

I’ve just found out that recompiling modules lots of times (~ hundreds of times), even if the modules don’t change consumes memory from the BEAM’s literal allocator. This memory doesn’t seem to be garbage collected and it will fill up and cause errors. If you decide to generate mutants one at a time and each mutant requires a module recompilation, you might hit these errors…

I’m just warning you because I found out about this problem the hard way (i.e., when I tried to recompile my test suite about 300 times for 300 mutations in an elixir module). Fortunately, the way I’m generating mutants doesn’t require recompiling the code for each mutant, so I could work around it.

1 Like

Lol, that’s awesome. ^.^

I wonder what is leaking, if you could reduce it to a simple test-case (recompilation loop?) and figure out if Elixir or OTP issue then a bug report can be submitted? I know they both store information out of band, wonder if it is ever cleaned up…

I’ll look into it. IMO, if we are recompiling the same module, we shouldn’t consume new memory from the literal allocator. And I did try to clean up the code as much as I could. I’m not as motivated to get at the bottom of this, though, because my approach no longer depends on recompiling modules…

1 Like