Sounds like Jay and I are strongly in the minority here 
I agree with this 100%. In fact, the other (looser, but equally central) tenant of my approach to testing is always to test as little as you can get away with. Obviously this is hugely domain, or even feature, dependent, but I find that a lot of other devs I’ve worked with had a tendency to either not test almost at all, or significantly over test. If I’m writing a test, it’s either because 1) I’m TDDing the happy path, or 2) I’m proving a bug fix. Diminishing marginal utility certainly applies to tests but I’ve rarely working on much ‘mission critical’ software where priorities can be vastly different).
Also 100% agree. As I said in the original post, this is one of the very few rules I try to consistently enforce (there are many patterns I avoid and anti-patterns I think are fine, depending on context, cf), and it does take enforcing, because as many have noted, it’s often very inconvenient.
That said, I’m curious about this claim:
I’m wondering what be “more readable” about tests that combine setup and assertions, aside maybe from fewer LOC? That is probably what I would say is clearest and most consistent advantage of 1APT. As @al2o3cr pointed out, it’s simply undeniable that it costs more cycles. But when a test breaks, it seems relatively uncontroversial that the 1APT approach in his example is much easier to quickly parse and see the precise issue, which is my priority number 1 when I am looking at a test for almost any reason (regression, trying to understand an API better)
I would organize our test cases along these lines:
describe "when delete is requested" do
setup do
conn = conn |> log_in_<%= schema.singular %>(<%= schema.singular %>) |> delete(Routes.<%= schema.route_helper %>_session_path(conn, :delete))
%{conn: conn}
end
test "redirected to root", %{conn: conn} do
assert redirected_to(conn) == "/"
end
test "removes session", %{conn: conn} do
refute get_session(conn, :<%= schema.singular %>_token)
end
test "adds flash", %{conn: conn} do
assert get_flash(conn, :info) =~ "Logged out successfully"
end
end
so that if something in the flash logic breaks I get F "when delete is requested adds flash"
while the other tests stay green. For me this is a payoff worth a good number of CPU cycles.
I can’t share the actual code but recently I had a junior dev write a reporting test like this:
describe "report name" do
setup do
create_row_x
create_row_y
# like 20 more lines of this
end
test "works", %{result: result} do
assert %{x_count: 1, y_count: 5, z_count: 7.23} = result
end
end
I needed to make a change to just one of the factories used in the setup and the single assertion broke (this example certainly speaks to the ambiguity of the concept of “an assertion”). It was essentially impossible to tell what the problem was, I had to go through the setup line by line and figure out which of 5 or 6 calls to that factory was the problem, without knowing what details were essential to the assertion and which were simply convenient when the test was written. Now, rewriting that test module to use 1APT test was not convenient at all, a lot of the setup was necessary to get the report to return anything. And certainly it ran much slower (I may even go back to benchmark the change because I’m now curious how much). But the result looked something like this:
setup do
# data required for report to return anything, probably 7 lines
end
describe "report with 1 x" do
setup do
create_x
%{result: exec_report}
end
test "returns 1 for x_count", %{result: result} do
assert %{x_count: 1}
end
end
describe "report with 1 x and 2 y" do
setup do
create_x
create_y
create_y
%{result: exec_report}
end
test "returns .5 for x_y_ration", %{result: result} do
assert %{x_y_ratio: .5} = result
end
end
This is the main reason I tend to enforce this pretty strongly, even when there are some undeniable disadvantages. I wouldn’t want a discussion about CPU cycles vs readability to hold up a PR review. I want there to be a clear standard in place, at least for each type of test (controller vs context vs e2e etc). I think it’s important to the value of DAMP that it’s not a dogmatic pattern (I really strongly dislike those the most, and there seem to be plenty). But there is little more unfortunate bike shedding than test bike shedding, I think.