Excoveralls library: "Code coverage objectives inhibit quality improvement"

StefanHoutzager · June 18, 2018, 2:57am

Looks like this is an actively maintained and valued library (380 stars): GitHub - parroty/excoveralls: Coverage report tool for Elixir with coveralls.io integration.
I see no disclaimer however about the risc of coverage objectives inhibiting quality improvement + coverage having no correlation with code quality.

Dave Thomas in programming elixir:

Test Coverage
Some people believe that if there are any lines of application code that haven’t
been exercised by a test, the code is incomplete. (I’m not one of them.) These
folks use test coverage tools to check for untested code.
Here we’ll use excoveralls to see where to add tests for the Issues app.

related: BDD / TDD criticized

Also worth a read for all good lib maintainers on github providing
extensive unit tests: (which could make others to follow their
example - add extensive unit tests to their code) :

I found that not only were unit tests extremely brittle due to their coupling to 
volatile implementation details, but they also formed the wider base of the
regression pyramid. In other words, they were a pain to maintain [..]  I set 
forward a simple guideline for the developers: Always give preference to 
integration/system tests over unit tests. I didn't ban unit tests—that would 
be stupid—but reserved those for special cases. [..]
Give up unit tests and get results
We are now three years into our product and have been getting tremendous 
value from our automation approach. Try it yourself. Not only will giving up 
on unit tests not hurt development experience or product quality, but it will 
also allow your team to focus efforts on the system and integration level 
tests that provide you with the biggest bang for the buck.

Follow up: Speed up your CI pipeline: Throw out your tests | TechBeacon

cmkarlsson · June 18, 2018, 4:56am

Of course unit testing and code coverage can be misused. But I believe they are great tools to have. In functional programming these tend to be pretty easy to maintain as well. Again, use your judgment and don’t blindly test everything and don’t blindly specify that 100% (or whatever %) needs coverage.

Unit test up front is for me the quickest tool to get a simple function right. Often I don’t even care to keep the test once it is written as it will get swallowed by a “larger” functional test later on. But, write code, compile, test in repl, fail, rewrite code, compile, test in repl only works for simpler stuff.

Note that I don’t say TDD as I don’t actually use the tests to design my application/module/function. I just use a test to write down the specification of what ever I am writing and then making it pass. I really should use more property based testing here but have not gotten into the habit of doing this yet.

As for coverage, I use the reports to see if there are any obvious cases in the specification I have not tested or even written code for. I don’t care too much for a certain percentage just that I have tests for the specified use cases.

If unit tests are brittle they are no good unit tests. They should test specific things only. I only find them as “leafs” in the code tree. I.e implementation detail. They should be close to code (which is why I like to have unit tests in the same file as the module). Most often functions are straight-forward enough so that they don’t warrant their own unit tests and can be tested through their intended APIs.

The more important functional and integration tests which tests your API (and hence should definitely not be brittle) outside of the module in common tests or other test framework. In my opinion you should also not test “implementation” from these tests. That is when things become brittle.

I write most core code to be as side-effect free as possible to help with testing. This sometimes make the code more awkward as all side-effects are isolated to the edges but it makes the testing of the application much easier. I’ve found that some people struggle with this coming from imperative/OO languages.

Even he summarizes that code coverage is useful as long as your don’t blindly specify coverage as an objective. I fully agree. In fact, isn’t the saying that any metric which is turned into a target is useless.

It is not Integration Tests vs Unit tests. They do test different things. Both are generally needed, but I believe integration tests are more important.

Again, I wish I knew property testing better and I plan to spend quite a bit more time there to learn it properly

josevalim · June 18, 2018, 8:55am

Plus, if your unit tests are brittle and breaking all the time, then it is time to seriously evaluate how to write unit tests. Most of the codebases I have worked with that have a poor test suite do that by either excessively testing implementation details (sometimes misguided by the 100% coverage goal) or by abusing mocks (which I have already talked about in the past).

I have seen this dance happening many times:

Write unit tests focused on implementation details or around mocks → Find out that your unit tests are brittle → Remove them
Integration tests are now used to test all possible different combinations of your software → Find out that your integration tests are slow → Remove them
Your application is now without proper coverage → Users have now become your testers and they started finding bugs → Go back to step 1 or step 2

The key is to find the balance. Use unit tests for asserting behaviour and not implementation. Use integration tests to assert the stack works as expected (and not to assert on all possible different outcomes). In rare cases I skip writing tests too, when doing so would be extremely expensive or very hard to do (the IEx CLI startup comes to mind as an example). But then I tend to write code comments explaining the reasons and things to assert when verifying manually.

StefanHoutzager · June 18, 2018, 9:29am

Use unit tests for asserting behaviour and not implementation.

Agreed. What I see often in elixir libraries on github is assertion of implementation. Call a function with a certain input and see if the expected value is returned. But asserting behaviour, is that unit (a unit being a function/procedure/method) testing or is it (often) functional / integration testing where more “units” may be involved?
On brittleness: http://www.readytorocksd.com/the-brittleness-of-unit-tests/
What I mostly did in the past is writing (almost) end-to-end tests for large refactorings. They gave the most ROI. Slow but very informative.

kelvinst · June 18, 2018, 9:56am

I agree with everything you said about the 100% coverage objective being bad. But I just thought in one situation that coverage tools could be useful: what if instead of adding tests to match the lines that are not covered, we remove those lines by pattern matching or something like that?

Maybe that is a good way to go with those tools, didn’t think too much about it, just had the idea, but what do you think?

StefanHoutzager · June 18, 2018, 10:39am

what if instead of adding tests to match the lines that are not covered, we remove 
those lines by pattern matching or something like that?

I do not get what you mean, can you explain?
I find coverage of functionality more important than coverage of code (lines). For the last large refactoring of an application module I wrote an end-to-end test. One year of refactor work for me personally, besides that work for a tester / customers doing acceptance tests and others explaining me the supposed functionality of some undocumented piles of mud . This was entirely fit for the usecase. We had a lot of the tester screen input (the tester knowing customer habits well), we could retrieve that input from the database for lots of usecases. This input (let’s say in json format) could be fed to the application case by case, automated, (the refactored version and the not refactored version), mostly at night, and the output was written to files that could be compared automated also. Differences had to be inspected. Often bugs appeared to reside in the not-refactored version. I did not have to constantly rewrite tests because of function (signature) changes or deletes. Overall this project was a big succes, customers of this company are a lot more satisfied with the stability of this module now.

kelvinst · June 18, 2018, 12:23pm

A silly example. Given the code and its test:

defmodule Foo do
  def bar(value) do
    if is_integer(value) do
      :ok
    else
      raise "argument error"
    end
  end
end

defmodule FooTest do
  use ExUnit.Case

  test "bar does what it does" do
    assert Foo.bar(1) == :ok
  end
end

The coverage tool would not return 100%, since the tests do not cover the else statement. Most people would argue that then you need to add this rule to the test:

  test "bar does what it does" do
    assert Foo.bar(1) == :ok
    assert_raises RuntimeError, -> Foo.bar("test") end 
  end

But as you mentioned, maybe that’s not part of your functionality, and actually to really test it would mean you should test other types too, something really not relevant for this functionality.

So, what if instead of adding more examples to the test when the tool accuses it’s not 100%, we adapt our actual code to meet only what’s being tested? Like this:

defmodule Foo do
  def bar(value) when is_integer(value) do      
    :ok
  end
end

defmodule FooTest do
  use ExUnit.Case

  test "bar does what it does" do
    assert Foo.bar(1) == :ok
  end
end

If it does not make sense to have a test for it, why would it make sense to have code for it? If it’s not a functionality of the function, why do that line of code do even exist?

Did you get it? We have pattern matching, guard clauses and other tools to help us to care only about the happy path… So let’s do it! And I think the coverage tool can help us to identify some of these cases.

Of course there are times that we need the code to handle it, but then, how can you say that the failure handling is not part of your functionality? You actually coded it.

I don’t think pursuing high coverage would make your code or tests better by itself, it’s what you do with the data it gives you that matters. Of course having it covered doesn’t mean too much if whoever is using it uses it badly, but then you could use this argument for practically every tool.

But I didn’t think too much about it yet, maybe someone can think on a downside of doing it, or maybe the case I mentioned is too silly and doesn’t really happen that often in real life. I guess I’ll try it out on my next side project to see how it goes and I share my thoughts after here.

StefanHoutzager · June 18, 2018, 12:42pm

I would use the pattern matching variant, but for reasons others have explained (easy to find somewhere ;-).
But you did not test an unhappy path in the pattern matching variant.

kelvinst · June 18, 2018, 12:44pm

Yeah, but the code is also not defining the unhappy path, and, therefore, the tool would give 100% coverage.

StefanHoutzager · June 18, 2018, 12:53pm

Ok, haha.

dwahyudi · June 18, 2018, 2:21pm

100% coverage of what?
Lines of codes?

In lots of complex business applications, a function/method cannot be covered easily by one or two tests. 100% lines of codes coverage doesn’t make sense.

kelvinst · June 18, 2018, 2:44pm

That’s true, I think one should not pursue that. But I don’t think this is an argument that totally invalidates the use of coverage tools at all. They can be a good indicative that you have code that either should be tested or be removed.

josevalim · June 18, 2018, 8:39pm

I believe there is a misunderstanding on what “testing implementation details” means so I will be more verbose here and clarify that I was referring to “tests that are coupled to a particular implementation”.

Tests that are coupled to an implementation are the ones that break when the implementation is rewritten but the functionality is precisely the same.

Under this definition, tests that are verifying the input and output are usually not coupled to a particular implementation. We assume the input and output typically outline the expected behaviour of a function (unless there are side-effects). In fact, it is actually harder to couple to implementation details in Elixir because you can’t invoke private functions. We are also more restricted on the user of mocks and stubs, which helps too.

But for example, there has been a recent request to add a feature to Phoenix that allows you check a certain route would invoke for a controller/action. To me that’s not a useful test at all. It doesn’t guarantee the controller exists, it doesn’t guarantee the action exists, it is just asserting some implementation detail. I may refactor the code, have the router point to a new action that has the same functionality, yet my suite will fail.

dwahyudi · June 19, 2018, 3:21am

All of the code we write must be tested, it is our responsibility as engineers to do it.

No test = don’t deliver the code.

StefanHoutzager · June 19, 2018, 4:19am

Six likes for your comment, I’m almost jealous! But no critical reaction, grrrr, what is this for a community?
A piece of functionality may be moved to another module (+ to another application). The signature of a function could be changed. It could be removed because it appears to be a duplicate. Moreover when building software you are often not (only) refactoring. My point was brittleness and roi.
I believe there is a misunderstanding on what “brittleness” means, so… Well, brittleness is not defined strictly in a software-jargon dictionary either so I already posted a link how I understand what is meant in the articles

What I mean, is that the addition of unit tests make the whole system more brittle. [..]

So you hand in agility.

http://www.readytorocksd.com/the-brittleness-of-unit-tests/:

I found that not only were unit tests extremely brittle due to their coupling to 
volatile implementation details [..]

https://techbeacon.com/speed-your-ci-pipeline-throw-out-your-tests.

Now you dare to like this post.

StefanHoutzager · June 19, 2018, 7:39am

Also:

In this study, we reveal that because testing plug-in-based software is complex and challenging,
many developers focus on unit testing and rather circumvent higher level testing activities, 
such as integration or systemlevel testing.

https://repository.tudelft.nl/islandora/object/uuid:34a51f6a-a286-42ed-b9f1-2c9221b10ec3/datastream/OBJ/download

Warning: dissertation, academic content, 210 pages download

josevalim · June 19, 2018, 9:35am

I will quote part of the conclusion in the last article of the series here:

What we’ve learned here is this: in terms of stability and robustness, design trumps unit tests.

There is absolutely no discussion from me here. Note the conclusion, however, is not “do not write unit tests”, it is “do not write brittle unit tests”.

It feels like this is being posed as a zero sum game: or you design correctly or you write unit tests. But it isn’t. Design the code, define the contracts and then verify those contracts.

What does a password validator do? Does it validate passwords generally or does it specifically validate the password length is 3? If the minimum length was actually hard coded to 3 for 5 long years and never changed… does it then matter that the code is hard to change once in every 5 years? Should you optimize software to be changed? Do you optimize it to be replaced? Which assumptions could be made when the system was originally designed?

Also I hardly doubt that the problem he pointed was specific to unit tests. Imagine you have a system that performs password validation when creating an account. The minimum password length is 6. What are the odds that, when writing an integration test for this feature, the developer chose as an input a password with exactly length of 6 (and most likely “123456”)? Won’t the integration test also break when you increase the limit to 8? Even worse, we cannot pass the password validation parameter in an integration test! What does this say about integration tests? How will you react to this? Does it happen frequently?

For Elixir itself and for libraries, the module and the function name are part of the contract. If I accidentally rename Enum.map/2 to Enum.mapo/2 then I absolutely want my tests to break.

I am glad to concede though that most users are not writing languages or libraries and the requirements in applications are very different. But I also think this is part of the problem, we are really bad at defining contracts between the different parts of our systems, so we end-up writing unit tests to all individual parts of the system, and then changing all of those parts at once when implementing a new feature.

I won’t disagree that there is a tendency to abuse unit tests, either by writing too many or coupling to implementation details (like a test that invokes a handle_call/3 directly). I have also seen a tendency to abuse integration tests, leading to systems with extremely slow test suites, that cripples the team motivation and the development-feedback cycle.

Still, I don’t find the answer is “do not write unit/integration tests”.

StefanHoutzager · June 19, 2018, 9:45am

Still, I don’t find the answer is “do not write unit/integration tests”.

The article referenced in my first mail (The No. 1 unit testing best practice: Stop doing it | TechBeacon) says

 I didn't ban unit tests—that would be stupid—but reserved those for special cases.

Not writing integration tests is not proposed at all.

josevalim · June 19, 2018, 12:43pm

Indeed it was not. I was just saying that in my experience teams have abused integration tests, leading them to eventually avoid writing them too. Sorry for the confusion.

StefanHoutzager · July 14, 2018, 1:03pm

Also worth a read http://250bpm.com/blog:40 “Unit Test Fetish”
(Website of Martin Sústrik, creator of ZeroMQ, he has more interesting articles at this site, f.e.:
http://250bpm.com/blog:105 “On Intellectual Honesty”

It was only lately that I realized that I can phrase "intellectual honesty" in more precise fashion. 
That it can be defined as an ability (or willingness) to remain in the state of cognitive dissonance.
[..]
It's easy to see how scientists are confronted with cognitive dissonance. They've invested their
time and credibility in this research and now there's this new paper that contradicts it. It they 
admit that the paper is right they are in essence admitting that they've been stupid before. 
And that's not a pleasant thing to do. It's much easier to dismiss the new research as stupid. [..]

)