Mutation testing on Elixir 1.14?


As some may already have heard, a coworker and me have taken over a rather large codebase a while ago and as there is no-one walking as through it, we do a lot of discovery on our own.

And even though there is a nominal test coverage of ~90% the following problems exist:

We already learned, that for some reason a mix test --stale runs several tests again that are unrelated and we assume that this is due to massive dependency cycles within the codebase. This is not the problem I want to discuss here.

Sadly, we also discovered, that sometimes after a change mix test --stale is not running relevant tests. We assume that this is due to massive (ab)use of mocking.

Even worse: We removed whole modules (implementations) by accident, and the full test suite still passed, and we therefore lost some, if not all, confidence in the suite.

To regain confidence I wanted to run some mutation testing and tried exavier (by @dnlserrano) and muzak (by @devonestes). Sadly neither has received any updates in the last 18+ months, and neither worked with Elixir 1.14 (which we upgraded to at work).

Is anyone aware of an actively developed mutation testing library that is compatible with 1.14 and does not require massive code changes?

Also, exaviers direct mapping between test files and modules would be something that is actually more a hinderence than a benefit, as even though (mostly) the mapping exists in a 1:1 mannor, it doesn’t match exaviers inference and we had to do a lot of manual overrides (there are 700 modules, my guess is that only 100 could be infered by exavier).

And a semi related question: Do I understand (the idea behind) mutation testing correctly, that it is made exactly for this kind of situation, to regain/strengthen confidence in the testsuite?

1 Like

I don’t think mutation testing is necessarily going to help much given the problems you’ve described - the changes it makes (random example: ROR3 from Exavier) are mostly function-scale. That kind of mutation is good for making sure your tests cover the < and the = situations for an <= comparison, but “we removed a whole module and the tests still passed” is a bigger issue.

My interpretation of your situation is that the codebase has fallen into a mocking trap; there are real implementations that are replaced with a mock EVERYWHERE and not tested individually. Some ways to address that:

  • write specific tests for the thing that’s being mocked everywhere. Ideally there would be a corresponding test for every scenario that’s set up in the mocks, to demonstrate that the real thing actually does the what the mocks are pretending to do.

  • write higher-level integration tests that don’t use mocks. For a legacy codebase, the “happy path” is a good place to start. These will be slower than isolated unit tests, so you may want to tag them and run them as a separate CI step.

1 Like

The “delete full module” was an extreme example.

And in this case indeed dead code as we learnt today. It still leaves a bitter taste.

We hoped that even small mutations could help us identify false coverage earlier, and we could integrate it in our weekly flow to run the mutation test about once a week and tackle those that failed to fail.

Of course we also have strategies for other improvements, though as we always have to keep in mind that everything has to be paid by someone, we can’t follow all of them at the same time and try to find quick ways to uncover the easier to fix things.

1 Like

There was a mini-discussion on DevTalk a while ago here – What dev-related stuff have you been up to? - #265 by davearonson - General Dev Chat - Devtalk – where the guy answered a few questions about mutation testing. TL;DR I still don’t see its value. Still, the libraries you mentioned not receiving updates might just mean they are considered finished so I’d say if you’re convinced you want mutation testing then just go for them.

(It seems to me that mutation testing boils down to: if mutated code that’s smaller than the original still passes the tests then this very likely means the code can be shrunk.)

In your case I’d say plain old observation plus adding test coverage is the best way to learn the code and gradually tighten your grip on it. It’s a long and tedious process though and I am not aware of the realities of time and financial budgets for that project so that advice is a bit academical, sadly.

Something that may or may not save you time – property tests. But it’s a game of chance really, it helped me once and definitely used a ton of time a few others times after that. The one occasion it helped me was exactly when I was unsure which modules did what (because they were intertwined with 20+ others and it became difficult for a human to follow).

1 Like

No, that’s not what mutation testing is about, mutation testing is to uncover missed edgecases.

def add(a, b), do: a + 1

test "adds one", do: assert 2 == add(1, 1)

This has a 100% testcoverage, still, it’s obviously missing edgecases.

Mutation testing shall help you find the less obvious ones.

In short: they shall help you to regain confidence in the tests.

It’s basically testing the tests, not your code.

And the problem with the libraries I already mentioned is, they use some internal API and do not work with elixir 1.14.

That is not exactly why we have mutation testing. Mutation testing is something slightly different. The reason for mutation testing is testing our tests. So using your example:

def add(a, b), do: a + b

test "adds one", do: assert 2 == add(1, 1)

This is correct test, but the thing with mutation testing is that if I change the code to:

def add(a, b), do: a - b

Then our tests should fail.

So the thing with mutation testing is that we test how well our tests check for accidental bugs that change logic. This is slightly different from the property testing that test if the properties are kept, it is checking how well our tests are written and check if we do not have tests that are moot (their output do not change even in case of logic changes).

1 Like

I think it’s even fair to say, it’s quite radically different. Property testing is an opposite idea, in a way. You “play” with inputs fed into a tested code, but you don’t modify that code.

1 Like

Indeed, mutation testing is property testing for your tests :smiley: