BDD / TDD criticized


Dan Abramov ‏ @ dan_abramov 19 uur19 uur geleden

In particular I’m curious about the emotional investment into a particular testing methodology. Could it be because they’re all not that great, and so we develop devotion in order to feel our choice is justified?

Becca Nelson Bailey ‏ @ beccaliz 18 uur18 uur geleden

I think many of us have been told that good testing practices are what separates “good” programmers from “bad” programmers. We have given testing an almost moral value, but we don’t have an absolute standard to judge by, so the standard becomes our own practices.

1 Like

A plea to do testing:

It took about five years until I understood how testing works (for me). And I can’t imagine doing any programming without anymore. My key to success with testing was to not see it as something I do in addition to my code but just as part of it. I write tests first because I first wanna define how my not yet written code can be used. If it is hard to think about a simple test for a new function, I first refactor my existing code thus I can come up with a simple test. Because if it’s hard to write a test for something, most likely that stuff will be hard to understand and use for other programmers (or myself in the future).
So, the key point here is, not even think about how I will implement the function before I have a definition of how to use it then. For example, if I have in mind that I will use a struct for something, tests often drive me to use a simple list.

When I have to continue on the code I haven’t touched for a while, I first run
mix test --trace --seed=0 and immediately have a specification of the status quo.

When I look out for something on Github, I first read the tests to understand how I can use and eventually change/extend that code. If there are no tests, or in case I can’t understand the tests, I skip that repository.

Other than that, of course, I feel so comfortable when there is a trustworthy test suite if it comes to refactoring existing code.

I gave up arguing about TDD, I just do it and enjoy a much stressless life.

A plea for requirements engineering before coding: BDD / TDD criticized. Lamport makes the distinction with TDD clear in his paper.

1 Like

As I wrote, testing is part of my way of developing applications. This doesn’t mean that I don’t do requirements engineering up front. Just the opposite is true. But, when I start coding, I start with a test.

Often the tests / the fact it is hard to find a proper test / shows me when my upfront design has flaws in details. And in this case, I tend to rework my blueprint rather than question if I’ve waste time writing the tests.

Start without a blueprint is wrong. Do it without testing is wrong. Testing just for the sake of coverage is wrong. Strictly following a blueprint for any price is wrong. Doing it right needs a lot of practicing and failing is more likely than success.

So, I totally support your plea for requirements engineering :wink: but at the same time, I try to encourage people to do proper testing.

Can I hope that we can agree on, that one is just as important as the other?

This can happen when you’re finding out what you really want. Testing shows “how sloppy your thinking is”. Happens to me also, by far the most requirements I find out without testing however. Often my tests are end to end, as they give the most ROI. I am going to put a small js thing on github in a short while. The code contains an extensive list of requirements and as far as needed comments in the code explaining the how. Not an extensive testsuite. The code should be understandable by reading requirements, the code itself and the comments in the code. Tests would not add value with respect to understandability I think. I would accept and maybe even write tests myself per requirement if it would not hamper refactorings, and if it did not cost so much extra time. I would not trust a testsuite that much as the guarantees given by the results are not quite 100%.

Discussion on twitter with Ron Jeffries etc.

Use of Formal Methods at Amazon Web Services, paper from amazon:

Kent Beck:

Have fun reading the reactions on HN :wink:

This psychological concept of intellectualization is recognizable in quite some more discussions of course. :wink:



Found the comments far more interesting than the article - which for some reason felt to me like suggesting that straightening one’s path is the solution to solve this problem:


¯\ _(ツ) _/¯

Someone also linked to this: Just Say No to More End-to-End Tests (2015)

I do not understand what you mean. I found the article good in that it values return of investment and sees the least roi in unit testing, generalized speaking. Like Coplien (see the very first message in this thread with the link to his ), like what Rich Hickey says in that first message also. Concerning the link you send: I read more than one time about the “pro unit-test” culture at google. Not every company has the same culture, for example facebook has a totally different testing strategy. Here someone from google looking for another company to work for and interviewing them about their testing strategy (expect not much about those other strategies, you can find something about f.e. facebook easy online):

Director of engineering at facebook (and not quite a dummy) Erik Meijer says

in that it values return of investment and sees the least roi in unit testing.

To me the yardstick the article applies seems to follow thinking along these points:

  • Tests are a byproduct of arriving at the end product. The end product has all the value, tests have none of it.

Harsh but true.

  • Tests contribute no direct value, so their cost need to be minimized.

This is where things get slippery, it is easy to simply measure money spent - it is much more difficult to assess if the money is being spent in the right place, i.e. effectively.

For example not writing unit test saves money but what about the cost of time spent debugging, just trying to find the root cause of a defect that was uncovered by end-to-end testing? Granted creating unit tests at too low a level of granularity can be expensive but having something in place that can narrow the search or better yet catch it before it gets to e2e can also save money. I’m not convinced that e2e-and-fix is always and consistently cheaper.

  • e2e tests are closest to running in production, therefore it is self evident that they are the most valuable tests.

Nobody is arguing that e2e isn’t necessary. But even e2e can miss part of a system’s operational profile and for greenfield systems the operational profile is often just a guess. So unit testing is part of a “defense in depth” strategy - try to catch problems as early as possible and have multiple points where problems could be detected if one of the lines of defense fails. But any type of redundancy incurs cost, the issue is whether it is actually worth it.

The other aspect that seems to get no consideration is the economic risk/cost of a defect making it out into the wild. Not all defects are created equal. I wouldn’t consider most of the services of Netflix, Facebook or Google as mission critical. Big deal if somebody can’t get movies for a few hours or even a few days - even if they loose a few customers, they have so many of them - and even accidentally exposing a few thousand credit card numbers seems to be survivable for them. Other organizations may be considerably more vulnerable to the effects of defect exposure.

  • e2e is the most valuable so that is the only testing that should be done
  • Hence e2e has the best ROI

I do not understand what you mean

Implicitly the article suggests that unit tests are unnecessary detours that cost money therefore the solution is to stop taking detours and move straight to the goal of the working, successful system - which retroactively informs what the complete set of e2e tests should be (for due diligence).

NATO Software Engineering Conference 1968, page 21:

Kinslow: The design process is an iterative one.

Why is software development iterative? I think because while we’re still at point A we may think we know where point B is but usually we’re wrong. So often the only way to progress, is to move forward a bit and then re-assess with any new information that has been gathered whether we are moving in the right direction. That type of progress rarely moves in a straight line. Big Design Up Front tried discover the straight line to B - up front - while staying mostly at A but then often failed to acquire the knowledge where B actually is or that they actually needed to go to C.

Also unit testing serves a somewhat broader objective than e2e testing. Unit testing is also supposed to inform design (usually to enable refactoring to reduce volatility of the marginal cost for the next feature). But like any tool, unit testing can be underused, overused, misused and abused.

It also doesn’t help that there doesn’t seem to be a universally shared understanding of what unit testing actually is (my interpretation was largely informed by Working Effectively with Legacy Code (2004) - i.e. building systems that are easier to change in the face of continually changing business requirements).

like what Rich Hickey says

He prefers to rely more on thinking (Hammock Driven Development) and describes TDD as “banging into guard rails”. Though doing anything in a mindless manner is rarely productive …


Not harsh in everyone’s (me f.e.) perception. While flying to the moon a rocket throws off parts that are no more needed, but they were necessary to arrive at the moon. The dumping was necessary also. :wink:

I have no formal proofs, and indeed you will have to spend more time finding the rootcause of a defect compared to a unit test, but in general I found writing e2e tests much more valuable. E2e and unit tests test different things. An e2e test tests real user scenario’s and integration of components. Unit tests can be used as complement as you and others have explained. Ok, we know all that. A nice extra is that e2e tests can be used with legacy code that is not neat. The problem I see is what Martin Sustrik calls “Unit Test Fetish”
The far cry of Uncle Bob: “if you don’t do TDD, you are unprofessional”. All those filling up of public repositories with unit tests because when not they might be taken for unprofessional. Not throwing away tests that were only of value during initial development / while learning the language. Not getting / wanting a job because of Uncle Bob standing at the door again.

You have a point here. Certainly with parts of safety critical software you have to test rigorously (formal verification) before releasing to production.

My idea. Anyone reading this doing shift-right testing btw? Any stories to share?


Funlist addition:

While flying to the moon a rocket throws off parts that are no more needed, but they were necessary to arrive at the moon. The dumping was necessary also. :wink:

And with the Space Shuttle the SRBs were not discarded. While they never themselves made it into orbit they were always an integral part of deploying the shuttle into orbit.

A nice extra is that e2e tests can be used with legacy code that is not neat.

I was once pulled into a project where the maintenance team of a legacy application hit a wall. Real world usage kept uncovering edge cases that the e2e testing didn’t account for. But even after adding additional tests they still had a devil of a time to track down the root cause of these defects. Typically the defect was detected so far downstream that they often got “lost” when they tried backtrack to the root cause with the debugger.

Unsurprisingly the application was a Big Ball of Mud so I attacked the only boundary that I could identify - between the application and the database.

It’s easy enough to simply trace that type of data at the driver level but typically you can’t arbitrarily instrument the data to identify the full context of that data. So I pulled all the DB code into a separate layer - in such a way that a shim-layer could be inserted in the development environment. That shim layer could then be augmented to log and instrument all the data that is moving back and forth between both sides.

Once that was in place it was much easier to reason out what the application was actually doing. The team could run a passing test and see what “healthy data traffic” looked like. Based on that they could formulate a working hypothesis of what “healthy data traffic” should look like for a new, failing test. When they encountered a deviation, they investigated typically with one of two outcomes:

  • Their hypothesis was flawed. They adjusted their hypothesis and carried on examining the data traffic.
  • Their hypothesis was correct and lead them straight to the root cause of the defect.

In the end this tactic allowed them to move forward when (e2e + debugger) failed them.

I came to the conclusion that often (e2e + debugger) simply doesn’t provide enough “developmental observability”.

Michael Feathers responded to Why Most Unit Testing is a Waste:

The Flawed Theory Behind Unit Testing (2008)

Unsurprisingly none of this convinced Coplien but the article makes some interesting points. First:

  • One very common theory about unit testing is that quality comes from removing the errors that your tests catch. … It’s a nice theory, but it’s wrong.

So not only is there rampant confusion about what exactly a “unit” is, as far as Feathers is concerned “testing” isn’t the primary objective of “unit testing”. He practices it to improve code quality - showing correctness is a mere side effect.

Code quality is a somewhat nebulous and subjective concept. There are software metrics to measure code characteristics (e.g. cyclomatic complexity) and some people establish limits on these metrics in order to narrow down what “code quality” actually is but the whole business has an air of educated arbitrariness about it.

But there is a hint in the preceding paragraph about the code quality that Feathers is after:

  • The problem that I saw with the mock object approach was that it only tested individual classes, not their interactions. Sure, the tests I wrote were nominally unit tests, but I liked the fact that they occasionally tested real interactions between a class and its immediate collaborators. Yes, I liked isolation but I felt that this little tiptoe into integration level testing gave my tests a bit more power and a bit more strength.

For me this is my second encounter of a “high profile” Detroit School (Classical) TDD practitioner expressing some skepticism about the London School (Mockist) TDD use of mock objects. The first was Martin Fowler:

But in the end they play nice and don’t press the point. But I think that was a mistake.

For example a good talk about keeping the volume of unit tests under control is Sandi Metz’s The Magic Tricks of Testing. But from Feathers’s perspective the problem is already apparent in the title - talking about unit tests and testing - not code quality.

Recently you posted about Mark Seemann’s work (thank You for that - I think anybody who has finished watching Scott Wlaschin’s talks needs to continue with Mark Seemann’s talks).

In Functional architecture - The pits of success he credits Jessica Kerr with the concept of isolation:

Inspired by Haskell, Mark Seemann pushes isolation to purity when he talks about Dependecy Rejection. I’ve seen the idea of the impure-pure-impure sandwich somewhere before though I can’t recall where; it’s an interesting approach though I’m not sure how it would fare in large scale software development. There isn’t a lot of commercial Haskell software development and imperative programming is still possible in Haskell due to the monadic do syntax.

However the extreme of purity is still instructive to demonstrate what isolation is after. Mutability by default which is typical of OO undermines purity as anything can retain state. But isolation does severely constrain what can influence a “unit’s” behaviour.

I suspect that the code quality that Feathers is after is isolation and the units he works with are units of isolation. Units of isolation could be a single class but often are a collection of collaborating classes. So there is no directive to write unit tests for every single class.

If so, unit tests demarcate the boundary of the unit of isolation and the assertions monitor the health of that unit while development progresses.

With classical unit testing there is significant design pressure to optimize the boundaries of the unit of isolation - otherwise the developer would have to write too much code for test doubles.

Mockist unit testing significantly reduced the penalty for needing test doubles and therefore removed the pain from assuming that the class boundary is the boundary of the unit - it became much easier to stop thinking (find out where the boundary should actually be) and lose the primary benefit (code quality) of unit testing.

That doesn’t mean mocking libraries/frameworks are bad. Mocking is essential for dealing with boundaries that you have no control over. But there should always be a very good justification for mocking your own code. And there is lots of material about mocking as a design smell (When is it safe to introduce test doubles?).

Classical TDD attempts to discover the appropriate units of isolation which in turn should lead to better code (as in easier to change later).

These days when unit testing is taught, mocking libraries like sinon.js are introduced before one gets to the point of thinking:

How can I organize this code so that I don’t have to create so many test doubles?

So if one accepts a predominant practice of degenerate unit testing which uses mocks heavily, and focuses on “testing”, rather than code quality as manifested by appropriately isolated functional capablities then these reports of the failure of unit testing aren’t surprising.

1 Like

Reminds me of what Lamport said BDD / TDD criticized :

In the first academic paper “On the Relation Between Unit Testing and Code Quality” (you can find the pdf on , it has the quoted title) no or only a weak relation is found. I have not read the paper in detail, but the result does not surprise me.

I could write more about this, but have to do some work now. With Hickey I think

Thanks for your reaction.

See Feather’s closing statement in The Flawed Theory Behind Unit Testing:

Quality is a function of thought and reflection - precise thought and reflection. That’s the magic. Techniques which reinforce that discipline invariably increase quality.

(Untimely meditations about software engineering)

If I have a clear spec to work from before I start coding, then yes, I will typically get the tests in place before I start working.

However, often times, I’ll iterate quite a bit of the design of a solution. Only once I feel confident in the design, or at least parts of it, will I start to add tests.

Prefer functional tests as well, as depending on the system, so long as they’re solid, you should be able to completely gut/refactor/rebuild whatever is going on internally and validate the system will still function as expected. If those functional tests are comprehensive, it should very much likely cover the internal components as well.

Dave Nicolette/neopragma:

Against TDD