Mutation testing on Elixir 1.14?

NobbZ · November 21, 2022, 7:46am

Hi!

As some may already have heard, a coworker and me have taken over a rather large codebase a while ago and as there is no-one walking as through it, we do a lot of discovery on our own.

And even though there is a nominal test coverage of ~90% the following problems exist:

We already learned, that for some reason a mix test --stale runs several tests again that are unrelated and we assume that this is due to massive dependency cycles within the codebase. This is not the problem I want to discuss here.

Sadly, we also discovered, that sometimes after a change mix test --stale is not running relevant tests. We assume that this is due to massive (ab)use of mocking.

Even worse: We removed whole modules (implementations) by accident, and the full test suite still passed, and we therefore lost some, if not all, confidence in the suite.

To regain confidence I wanted to run some mutation testing and tried exavier (by @dnlserrano) and muzak (by @devonestes). Sadly neither has received any updates in the last 18+ months, and neither worked with Elixir 1.14 (which we upgraded to at work).

Is anyone aware of an actively developed mutation testing library that is compatible with 1.14 and does not require massive code changes?

Also, exaviers direct mapping between test files and modules would be something that is actually more a hinderence than a benefit, as even though (mostly) the mapping exists in a 1:1 mannor, it doesn’t match exaviers inference and we had to do a lot of manual overrides (there are 700 modules, my guess is that only 100 could be infered by exavier).

And a semi related question: Do I understand (the idea behind) mutation testing correctly, that it is made exactly for this kind of situation, to regain/strengthen confidence in the testsuite?

al2o3cr · November 21, 2022, 2:07pm

I don’t think mutation testing is necessarily going to help much given the problems you’ve described - the changes it makes (random example: ROR3 from Exavier) are mostly function-scale. That kind of mutation is good for making sure your tests cover the < and the = situations for an <= comparison, but “we removed a whole module and the tests still passed” is a bigger issue.

My interpretation of your situation is that the codebase has fallen into a mocking trap; there are real implementations that are replaced with a mock EVERYWHERE and not tested individually. Some ways to address that:

write specific tests for the thing that’s being mocked everywhere. Ideally there would be a corresponding test for every scenario that’s set up in the mocks, to demonstrate that the real thing actually does the what the mocks are pretending to do.
write higher-level integration tests that don’t use mocks. For a legacy codebase, the “happy path” is a good place to start. These will be slower than isolated unit tests, so you may want to tag them and run them as a separate CI step.

NobbZ · November 21, 2022, 2:52pm

The “delete full module” was an extreme example.

And in this case indeed dead code as we learnt today. It still leaves a bitter taste.

We hoped that even small mutations could help us identify false coverage earlier, and we could integrate it in our weekly flow to run the mutation test about once a week and tackle those that failed to fail.

Of course we also have strategies for other improvements, though as we always have to keep in mind that everything has to be paid by someone, we can’t follow all of them at the same time and try to find quick ways to uncover the easier to fix things.

dimitarvp · November 27, 2022, 7:09am

There was a mini-discussion on DevTalk a while ago here – What dev-related stuff have you been up to? - #265 by davearonson - General Dev Chat - Devtalk – where the guy answered a few questions about mutation testing. TL;DR I still don’t see its value. Still, the libraries you mentioned not receiving updates might just mean they are considered finished so I’d say if you’re convinced you want mutation testing then just go for them.

(It seems to me that mutation testing boils down to: if mutated code that’s smaller than the original still passes the tests then this very likely means the code can be shrunk.)

In your case I’d say plain old observation plus adding test coverage is the best way to learn the code and gradually tighten your grip on it. It’s a long and tedious process though and I am not aware of the realities of time and financial budgets for that project so that advice is a bit academical, sadly.

Something that may or may not save you time – property tests. But it’s a game of chance really, it helped me once and definitely used a ton of time a few others times after that. The one occasion it helped me was exactly when I was unsure which modules did what (because they were intertwined with 20+ others and it became difficult for a human to follow).

NobbZ · November 27, 2022, 12:51pm

No, that’s not what mutation testing is about, mutation testing is to uncover missed edgecases.

def add(a, b), do: a + 1

test "adds one", do: assert 2 == add(1, 1)

This has a 100% testcoverage, still, it’s obviously missing edgecases.

Mutation testing shall help you find the less obvious ones.

In short: they shall help you to regain confidence in the tests.

It’s basically testing the tests, not your code.

And the problem with the libraries I already mentioned is, they use some internal API and do not work with elixir 1.14.

hauleth · November 27, 2022, 3:24pm

That is not exactly why we have mutation testing. Mutation testing is something slightly different. The reason for mutation testing is testing our tests. So using your example:

def add(a, b), do: a + b

test "adds one", do: assert 2 == add(1, 1)

This is correct test, but the thing with mutation testing is that if I change the code to:

def add(a, b), do: a - b

Then our tests should fail.

So the thing with mutation testing is that we test how well our tests check for accidental bugs that change logic. This is slightly different from the property testing that test if the properties are kept, it is checking how well our tests are written and check if we do not have tests that are moot (their output do not change even in case of logic changes).

fmn · November 28, 2022, 5:43am

I think it’s even fair to say, it’s quite radically different. Property testing is an opposite idea, in a way. You “play” with inputs fed into a tested code, but you don’t modify that code.

hauleth · November 28, 2022, 8:00am

Indeed, mutation testing is property testing for your tests

devonestes · December 7, 2022, 7:13am

You’re right that Muzak isn’t as supported as I’d like it to be (for reasons), but I am releasing a new version this weekend that works for 1.14, and should hopefully work for all future Elixir versions. I’m hopeful that I’ve gotten it off of any private APIs at this point.

Yes, the situation that you describe where you can essentially delete the “code under test” entirely and the tests still pass is basically the canonical example of what mutation testing helps with.

NobbZ · December 7, 2022, 7:19am

Thank you for letting us know about the current state. I am looking forward to the release and will experiment with it then.

PS: I’ve heard that there is a pro version of muzak, though I can not find what it actually gives in addition to muzak. Is there a feature comparision, such that I can try to get a budget for it once initial experiments with regular muzak have shown some actual benefits?

devonestes · December 7, 2022, 7:39am

More info on Muzak Pro is here.

The biggest thing you get is git integration so you can effectively include mutation testing as part of your CI process if you would like. It restricts the mutations generated to only the lines that have changed since the last merge commit. If you’re only generating mutations for the LOC that have changed, though, you can run mutation testing on that and it shouldn’t add too much time to your CI runtimes. You can also define a custom percentage that you’re looking to hit to indicate “success” for the run in the configuration to work how your team wants in CI.

And of course, because the runtimes of mutation testing increase as your number of surviving mutants increases, as your test suites become better the time spent in mutation testing goes down!

devonestes · December 8, 2022, 7:37am

Release compatible with 1.14 is up here: muzak | Hex