Mutation testing - Mutating BEAM bytecode

tmbb · August 18, 2019, 5:39pm

I have a mostly functioning end-to-end system now (with some hadrcode constants that need to be made configurable). I’ve decided to run mutation testing on Elixir’s Enum module. It turns out mutating something as widely used as Enum itself causes ExUnit to fail in uninteresting ways, so I’ve copied the source and renamed it Enom so that I can try the test suite. First, some terminology. An AST node that is mutated is called a codon (the name comes from molecular biology, where it refers to DNA segments that encode aminoacids). A single codon can give rise to more than one mutant.

It’s been about an hour and it’s still running (about half the mutations done). I don’t think waiting as long for a little less than 100 mutation codons is not acceptable. Especially because when I finish implementing all mutators the number of codons per module will increase…

The main problem is that many functions cause the functions in Enum to loop endlessly. Tests with infinite loops only fail when they reach the failure timeout (by default 60s). ExUnit can be configured to fail as soon as single test fails, but it still runs the whole test case… That’s very bad for: my logs show that any of my mutations in the Enum module cause at least 70 tests to fail (sometimes more), some of them because of timeouts (I’m not logging that, so I can’t be sure how many fail because of infinite loops). Any time that’s spent running tests after the first one has failed is wasted time, and I can tell I’m wasting too much time.

Keep in mind that none of this time is being spent recompiling the Enom module (the module being tested). The Enom module is compiled only once, and the mutations are toggled at runtime using the tricks above… So I have only two ways to optimize this:

Use shorter timeouts - this may be useful, but it will cause “legitimate” code to fail
Hack ExUnit so that it actually fails the whole test suite after the first test fails

The places in the code I might need to change seem to be very “deep” into ExUnit’s source, so I will have to write my own test runner, and rewrite most of ExUnit. I hope I can keep compatibility with ExUnit. I like ExUnit’s user-facing interface, even if I some internal implementation details are a little inconvenient for my purposes. The goal is to be able to write “normal” ExUnit test cases and have Darwin extract the test cases somehow.

I could then write my own test runner which actually interrupts the test suite as soon as something fails. This is something that might benefit other mutation testing libraries, such as exavier from dnlserrano. I wonder if he might be interested in working on an alternative test runner.

Another possible optimization is to spawn independent BEAM instances and split the mutations among them. In my architecture, a single BEAM instance can only test one mutation at a time. Having more BEAM instances would allow some extra parallelization (even in the same machine). The other instance would consume extra cores, of course, which would limit ExUnit’s ability to run tests in parallel, but this might be offset by the ability to test multiple mutations in parallel.