ExUnit and the "One assertion per test" pattern

tfwright · March 13, 2022, 7:12pm

I wanted to share a pattern I learned about way back early in my Ruby/Rails dev career and has remained central to my approach to testing just about every type of application I’ve worked on since. Like everyone, I’ve read encountered a lot of “X best practice” and “Y considered harmful” arguments over the years (I actually think it might be worthwhile creating a form section to collect and discuss those here), and this is just about the only one that I agree with almost without qualification or caveat: One Assertion Per Test. The pattern itself is a test specific application of the more general principle of DAMP vs DRY approaches to code organization, also from Jay Fields (AFAIK).

In the Rails world people seemed to follow this pattern (often unintentionally) thanks to the “one-liner” syntax defacto standard test framework, rspec. But every Elixir project’s test suite that I’ve examined seems to “group” related assertion in single blocks, and worse, shares a ton of data setup in describe blocks in I think, a misguided attempt to keep the test suite DRY, rather than DAMP (although as the name implies there is quite a bit of room for compromise here).

So I thought I’d share the original post and try to sway some people

So, anybody out there doing a version of this? Are people familiar with this idea and have counter-arguments to share? Tell me what you think!

gregvaughn · March 13, 2022, 7:38pm

I used to do that in Java-land. The idea was a 1:1 mapping of reasons to fail and tests that could fail. In the Java ecosystem, in those days (pre-Rails), true unit tests didn’t even go out of process to hit the db, so there was less of a performance concern with breaking down tests that atomically.

I don’t do that any more because it’s not worth that much “design” in test suites as well as the performance cost. I now look to test suites mostly for their regression value, so that when I’m adding some new feature or refactoring something that there isn’t some surprise dependency I wasn’t aware of that I just broke.

wojtekmach · March 13, 2022, 7:41pm

I’m personally pretty wary of such rules. I think certain types of tests would be worse off by following such rule. For example, in a controller test we might want to test different aspects of a response, what’s the status, what are the headers, what’s the body, etc. To some extent we could write them using one assertion to follow the rule but I think it would be way less readable. In high-level integration tests, especially when hitting the UI, it’d be even less practical to have just one assertion per test. For very low-level tests it seems like a pretty good rule of thumb though.

al2o3cr · March 13, 2022, 9:40pm

I’m not a huge fan of strictly following this approach - (or the Rspec one-liner syntax, TBH). Taken to extremes, it can mean slowing down a test suite by a factor of 2-3x or more due to repeated setup.

For concreteness, let’s look at some “off-the-shelf” tests as created by phx.gen.auth:

github.com

phoenixframework/phoenix/blob/71942df02a18530ff65ac688725fbeffba5fb124/priv/templates/phx.gen.auth/session_controller_test.exs#L84-L89

      
        
            test "logs the <%= schema.singular %> out", %{conn: conn, <%= schema.singular %>: <%= schema.singular %>} do
              conn = conn |> log_in_<%= schema.singular %>(<%= schema.singular %>) |> delete(Routes.<%= schema.route_helper %>_session_path(conn, :delete))
              assert redirected_to(conn) == "/"
              refute get_session(conn, :<%= schema.singular %>_token)
              assert get_flash(conn, :info) =~ "Logged out successfully"
            end

It’s not complicated to rewrite this to one-per-test:

test "logs the <%= schema.singular %> out and redirects to /", %{conn: conn, <%= schema.singular %>: <%= schema.singular %>} do
  conn = conn |> log_in_<%= schema.singular %>(<%= schema.singular %>) |> delete(Routes.<%= schema.route_helper %>_session_path(conn, :delete))
  assert redirected_to(conn) == "/"
end

test "logs the <%= schema.singular %> out and removes the token", %{conn: conn, <%= schema.singular %>: <%= schema.singular %>} do
  conn = conn |> log_in_<%= schema.singular %>(<%= schema.singular %>) |> delete(Routes.<%= schema.route_helper %>_session_path(conn, :delete))
  refute get_session(conn, :<%= schema.singular %>_token)
end

test "logs the <%= schema.singular %> out and shows a flash message", %{conn: conn, <%= schema.singular %>: <%= schema.singular %>} do
  conn = conn |> log_in_<%= schema.singular %>(<%= schema.singular %>) |> delete(Routes.<%= schema.route_helper %>_session_path(conn, :delete))
  assert get_flash(conn, :info) =~ "Logged out successfully"
end

This makes the test in question take 3x as long to run (because nearly all the work is done 3x) and IMO makes naming the blocks harder.

An approach I’ve seen to that naming problem in Rspec-land is nested describe blocks - which comes with its own “which setup will run” guessing-game, so much so that ExUnit has explicitly made describe non-nestable.

There’s also a philosophical question: what is “one assertion”? For instance, this looks like one assertion:

assert redirected_to(conn) == "/"

but redirected_to can raise, causing the test to fail, if things are not as expected:

github.com

phoenixframework/phoenix/blob/aa9e708fec303f1114b9aa9c41a32a3f72c8a06c/lib/phoenix/test/conn_test.ex#L433-L451

      
        
            @spec redirected_to(Conn.t, status :: non_neg_integer) :: String.t
            def redirected_to(conn, status \\ 302)
            
            
def redirected_to(%Conn{state: :unset}, _status) do
              raise "expected connection to have redirected but no response was set/sent"
            end
            
            
def redirected_to(conn, status) when is_atom(status) do
              redirected_to(conn, Plug.Conn.Status.code(status))
            end
            
            
def redirected_to(%Conn{status: status} = conn, status) do
              location = Conn.get_resp_header(conn, "location") |> List.first
              location || raise "no location header was set on redirected_to"
            end
            
            
def redirected_to(conn, status) do
              raise "expected redirection with status #{status}, got: #{conn.status}"
            end

if the conn is in the wrong state
if the status is not 302
if the location header is not set

So there are FOUR ways that “single assertion” can cause the test to fail.

One place I can see this technique being useful is when practicing “strong TDD” - the strict “no code can be written without a failing test” approach. Adding “One Assertion Per Test” to that practice means there’s never a point where the coder has to decide “add an assertion to an existing test or add a new test?”

A strict rule there would be useful to overcome analysis paralysis and start typing code.

BUT

That practice also includes a “refactor” step, and IMO the tests are fair game for refactoring. If strong TDD produced the three split-out SessionController tests above, a good refactor would be to note they have identical setup and actions.

BartOtten · March 13, 2022, 9:59pm

Wanted to reply, saw @gregvaughn was writing, decided he would probably say it better, discarded reply. Not disappointed.

Today was a good day.

slouchpie · March 13, 2022, 10:19pm

I’m currently working on a project that involves very complicated lifecycles. For comparison, imagine testing a chess match where you have to specify exactly where each piece is, except each piece is actually a short history of other chess matches and all their moves.

I have found it almost impossible to have one assertion per test. I find that I have to play out whole scenarios and assert along the way.

In fact, some of the tests in my current project are so big that I have accompanying diagrams saved as pngs in the test/support directory, as visual aids to help future devs understand what is even happening.

So I admire the spirit of “one assertion per test” and in theory I agree with the effort, but I would not make it an absolute rule. Like almost everything else in programming, it all depends on 1 thing - the domain, what you’re describing with code.

tfwright · March 13, 2022, 11:36pm

Sounds like Jay and I are strongly in the minority here

I agree with this 100%. In fact, the other (looser, but equally central) tenant of my approach to testing is always to test as little as you can get away with. Obviously this is hugely domain, or even feature, dependent, but I find that a lot of other devs I’ve worked with had a tendency to either not test almost at all, or significantly over test. If I’m writing a test, it’s either because 1) I’m TDDing the happy path, or 2) I’m proving a bug fix. Diminishing marginal utility certainly applies to tests but I’ve rarely working on much ‘mission critical’ software where priorities can be vastly different).

Also 100% agree. As I said in the original post, this is one of the very few rules I try to consistently enforce (there are many patterns I avoid and anti-patterns I think are fine, depending on context, cf), and it does take enforcing, because as many have noted, it’s often very inconvenient.

That said, I’m curious about this claim:

I’m wondering what be “more readable” about tests that combine setup and assertions, aside maybe from fewer LOC? That is probably what I would say is clearest and most consistent advantage of 1APT. As @al2o3cr pointed out, it’s simply undeniable that it costs more cycles. But when a test breaks, it seems relatively uncontroversial that the 1APT approach in his example is much easier to quickly parse and see the precise issue, which is my priority number 1 when I am looking at a test for almost any reason (regression, trying to understand an API better)

I would organize our test cases along these lines:

describe "when delete is requested" do
   setup do
    conn = conn |> log_in_<%= schema.singular %>(<%= schema.singular %>) |> delete(Routes.<%= schema.route_helper %>_session_path(conn, :delete))
    %{conn: conn}
   end

    test "redirected to root", %{conn: conn} do
      assert redirected_to(conn) == "/"
    end
    
    test "removes session", %{conn: conn} do
      refute get_session(conn, :<%= schema.singular %>_token)
    end
    
    test "adds flash", %{conn: conn} do
      assert get_flash(conn, :info) =~ "Logged out successfully"
    end
end

so that if something in the flash logic breaks I get F "when delete is requested adds flash" while the other tests stay green. For me this is a payoff worth a good number of CPU cycles.

I can’t share the actual code but recently I had a junior dev write a reporting test like this:

describe "report name" do
  setup do
    create_row_x
    create_row_y
   # like 20 more lines of this
  end

  test "works", %{result: result} do
    assert %{x_count: 1, y_count: 5, z_count: 7.23} = result
  end
end

I needed to make a change to just one of the factories used in the setup and the single assertion broke (this example certainly speaks to the ambiguity of the concept of “an assertion”). It was essentially impossible to tell what the problem was, I had to go through the setup line by line and figure out which of 5 or 6 calls to that factory was the problem, without knowing what details were essential to the assertion and which were simply convenient when the test was written. Now, rewriting that test module to use 1APT test was not convenient at all, a lot of the setup was necessary to get the report to return anything. And certainly it ran much slower (I may even go back to benchmark the change because I’m now curious how much). But the result looked something like this:

setup do
  # data required for report to return anything, probably 7 lines
end

describe "report with 1 x" do
  setup do
    create_x

   %{result: exec_report}
  end

  test "returns 1 for x_count", %{result: result} do
    assert %{x_count: 1}
  end
end

describe "report with 1 x and 2 y" do
  setup do
    create_x
    create_y
    create_y

   %{result: exec_report}
  end

  test "returns .5 for x_y_ration", %{result: result} do
    assert %{x_y_ratio: .5} = result
  end
end

This is the main reason I tend to enforce this pretty strongly, even when there are some undeniable disadvantages. I wouldn’t want a discussion about CPU cycles vs readability to hold up a PR review. I want there to be a clear standard in place, at least for each type of test (controller vs context vs e2e etc). I think it’s important to the value of DAMP that it’s not a dogmatic pattern (I really strongly dislike those the most, and there seem to be plenty). But there is little more unfortunate bike shedding than test bike shedding, I think.

tfwright · March 13, 2022, 11:52pm

Have to say this is one thing I strongly disagree with. I love refactoring app logic, I loathe refactoring tests. Some of my lowest moments in development that involved actual code were “refactoring” (I honestly don’t think it was an activity worthy of the name) highly over abstracted tests (we’re talking test helpers with meta-programmed helpers, and it was Ruby so everything was mutating everything). I admit I might be a bit traumatized there

tfwright · March 14, 2022, 12:00am

Will absolutely grant that this a disadvantage, maybe the primary disadvantage of 1APT. I think this is what @wojtekmach was referring to by “high level”, “integration” tests. Even with slightly less complicated state things can get hairy very quickly. I think I could happily agree to applying a modified standard in those cases.