After re-watching José’s keynote from last year, he brought up that he believes that reducing the number of tests you need to write is not something that typing brings to the table. This really hit home as it’s something I’ve suspected for a very long time now. For the eight months I was forced to use dialyzer at work, it changed absolutely nothing about how I wrote tests. Searching the internet has been of little help because it’s very easy to find a lot of accounts of people making this claim (sometimes claiming the test count reduction is significant) but almost always without example. The best I’ve found are tests whose sole purpose are to assert on the types.
Can anyone provide any insight here? If not it’s cool if this post slips, unanswered, into oblivion. If I get some examples of tests I wouldn’t write anyway, I have no plans to jump in and lecture why I wouldn’t write them. I’m more interested in uncovering a hole in my testing strategy. I’ve been working solo for over a year, now, (and previously at a company that wasn’t big on TDD) so I don’t have anyone to riff on this stuff with at work.
I also don’t want this turning into another debate on the merits of static typing. We did that already
From my experience as a solo dev, the most important tests are black-box integration-level tests that exercise the public contract of the app (UI if it’s a web app, the APIs if it’s a backend service). These tests are implementation-agnostic, so whether I’m using types or not has zero effect on them.
My app is a server-side rendered Phoenix app. I’m testing it by making multiple HTTP requests and asserting on the content of the responses. I have 173 of those tests plus 9 doctests and my suite takes ~6s to run. I threw away all other tests (no unit tests for schema or contexts). This gives me immense freedom in refactoring and rewriting the code.
I could see Dialyzer and/or a future type system eliminating some specific categories of tests, but not all development practices are going to produce them.
For instance, a strict “functions must have type guards” practice combined with a strict “no code without a test” strong-TDD practice could produce code like:
# function definition
defmodule Somewhere do
def some_function(a, b) when is_integer(a) and is_integer(b)
a + b
# corresponding test
assert_raise(FunctionClauseError, fn ->
You write tests how I sometimes dream of writing tests, which is pure end-to-ends We’re still very similar in that I don’t write unit tests but I do test my contexts. My only experience with Phoenix has been full stack LiveView apps, and I like to keep a very strict boundary between MyApp and MyAppWeb. I consider them separate applications with a strict one-way dependency (except for MyAppWeb showing up in MyApp.Applcation’s supervisor) even though I have yet to add another client served by MyApp. TDDing my contexts also helps keep their design honest even though I’ll fully admit that there is a lot of CRUD that end up being largely being the same boilerplate tests, though these are not the types of tests that would be eliminated by types. Also, because there are exceptions to every rule, I do write some unit tests for utility functions usually in the form of a doctest. These often live in a separate namespace from MyApp or MyAppWeb since they are essentially library functions.
You got me there as I’ve actually totally written tests like that before. I’d feel icky about it and it led me to start only writing guards for control flow. My generally strategy in the past couple of years has been to ensure that all data has been cast to a known shape at the outer bounds, so these types of things shouldn’t happen. What I really should be doing on top of that is using property-based testing and I’m a little annoyed with myself that this thread has led to me exposing myself that I still don’t Do you think types make property-based tests redundant?
My main point is that if your tests can be replaced by types, I would argue they are most likely tests not worth writing anyway. For example, I rarely see the purpose in checking for FunctionClausError (and similar).
On the flip side, believing types replaces tests (and docs), will lead you to lacking tests (and lacking docs).
To be clear, uses of guards like that isn’t “bad” but different folks will consider it varying levels of “useful”. Type-systems with runtime checking eg, Sorbet essentially generate guards / early-exits that check every parameter! There’s a spectrum for runtime type-checking (from “guard every function, even private ones” to “YOLO LET IT CRASH”) just like there’s a spectrum for control-structure usage (from “only ever use pattern matching” to “every function has a with”).
Re: property testing - I haven’t used it personally, but it looks cool. As far as types replacing tests, I haven’t seen any typing scheme that could completely accomplish that. For instance, I’m not aware of a type system that could correctly spot that this function is wrong:
def profit(costs, revenue) do
revenue + costs
(if somebody knows of a Haskell implementation of double-entry bookkeeping that can catch this with types, I’d love to hear about it )
Ya, all of that is exactly how I feel too and I guess I didn’t convey that very well (as I sometimes have trouble with on this forum… and in life ). I also responded pretty hastily to the other comments so it wasn’t very complete. Case in point:
I certainly wasn’t trying to say that! I generally get rid of those guards because I do my best to follow that very “no code without a test” practice and found them not to be very useful. To dig into the + example, I think in a language like Ruby that overloads the ever-living-heck out of operators, you’ll feel a greater need to test the sad paths. Since Elixir very nicely does not do any operator overloading (except for ints and floats… maybe there is something else I’m not thinking of) we’re not going to find ourselves in a situation where def add(a, b), do: a + b is going to work with strings, dates, CustomTypeImplementingPlus, etc. So in these cases, if we’ve cast any untrusted data into known structures at the outer boundaries and we have good integrations tests, add/2 receiving anything other than integers would be an exceptional situation. If it somehow it ever did, we can say “let it crash” and then manually fix the edge case. This of course isn’t a good story if we’re writing software that could potentially kill people, but I’ve never been in that situation
I was just talking about replacing property-based testing.
Thank you for the responses!
EDIT: Please correct me (if you will) if I’m way off base here.
…you’d be better off using a property test in Elixir that simply asserts that costs and revenue must always be >=0 and that the result of profit must never be greater than the revenue parameter. That gives you a reasonable safety net that you are not writing something idiotic. (Though if you wanna get into the negative values, it gets a bit more involved. Still, IMO not a bad example.) And now you can move on with life.
IMO no strongly statically typed language can help you here because there’s no way for your compiler to know your expectations; summing two integers / floats is a valid operation. You’ll have to have a type for each thing and combine them only through methods but then again, you can do that in any language.
On the broader topic: strong static types will help you eliminate tests where you have to explicitly assert that data whose shape is not obvious (mish-mash of maps / structs / tuples / lists) and the functions working with that data act like you expect them to. And to make bad state a compiler error.
I can’t think of a better example right now but, code from a previous contract:
I lost count of the times people get such subtle configuration hierarchies wrong (especially HTTPoison’s!) and have prod spit out errors as a result – to the point of seriously considering writing a library to validate them (if I ever have the time and energy in this life that is ).
…And don’t even get me started on the various telemetry configs. That’s a dark forest if I ever seen one.
With Rust you can eliminate 95% of these problems by doing something like this:
And then pass that around wherever you need it. (NOTE: It’s possible to construct an invalid path in Rust of course, but the point here is that you will have some validation while constructing it.) And you can use the constructors to make sure no invalid config is constructed. Though the constructors pattern can be used in every language, but in this case (Rust) I am demonstrating that you can formulate a type that makes it impossible to have a bad state (minus a bad path but let’s not latch onto that; there are limits enforced by the C API to the Unix OS-es after all and that’s not the fault of the strongly strictly typed language).
To me, the biggest win we can score with the set-theoretic type system is finally putting these mish-mashes of keyword lists and primitive values to rest (though I am very sure that checking various dependencies configs is not in scope but this is what I’d write to use the system when it exists).
So to me, a strong static typing system will eliminate the need for me to manually test weird data shapes.
Thinking of it, a TL;DR would be “it will help us interface with Erlang libraries”, maybe.
Being able to properly type data structures is the most interesting part of types for me. It’s all I ever missed in Ruby (and used Virtus/DryStruct). This comment mentions casting which would be interesting. It would be cool to have something like changesets in the standard lib that works across data types. I don’t know if that’s a big ask or a terrible idea or anything, just saying it would be cool
Do libraries like HTTPoison not validate their options? I’ve noticed some libraries do which I always appreciate and never really thought about if there are ones that don’t (I haven’t had to deal with much production config in my time writing Elixir).
For me its kind of simple.
Tests typically want to assure behavior not necessarily type.
I look at testing as a two sided spectrum.
The “outside” and the “inside”.
I work on the inside (unit tests) when I know what I want my code to look like and have already strong opinions.
I work on the outside (acceptance tests, headless browser tests ect) when I know what behavior I want but I have less strong opinions about how the code should look.
I work in the middle of these two (integration tests) when I want to abstract and create boundaries between my code.
The closes to all these in terms of checking the shape of a thing or asserting a thing is a thing is probably the unit test. Given that most unit tests are best when they are pure functions in most cases you are not checking the type as much as you are checking the shape.
That’s just my two cents.
One last thing to add, what would testing polymorphism look like in terms of checking interfaces?
I think that kind of test would likely hurt my head more than provide value.
While I agree there is huge value in writing tests until I started to write pure function unit tests I typically found writing tests really painful in almost all other langs besides elixir. Its one of the biggest reasons I love elixir, in that writing tests tend to be much less painful.
I don’t love complex type systems. My preference is for the types to fit the testing trophy, which to me means that the purpose of the types is to give the quickest feedback possible (right in the IDE) that something is off, before the code is even run. From that perspective, types supplement the developer’s experience.
Sadly I can’t cite exact examples right now – in the last few years I’ve worked with a mix of Elixir, Golang and Rust and some Elixir details are starting to slip away from my memory – but I have witnessed all three options:
Library validates the config you give it and spits out a generic oh-so-helpul “invalid config” atom (or raises), often preventing your app from booting in the first place. There were some that were saying what exactly is wrong but they were a vanishing minority.
Library does not check the shape of the config and just blows up at runtime.
Library checks the shape of the config and silently reverts to a default when it doesn’t like your input.
The last one is my favorite. One of the teams I was in lost 3 hours once to track down such a problem (unlucky for them I was having two sick days then). They just assumed that the library would be more vocal and would not silently do the wrong thing. And as we know, assumptions lead to gray hairs.
Yeah that comment’s point 1 is my main goal and foreseen value-add from any more static typing than right now: if we can make bad state hard (or even impossible) to compile, I personally guarantee many teams will save time – witnessed a good amount of time being wasted on such problems.
Back to your OP, in my experience strong static typing does not necessarily translate to less tests per se, but when you have a dynamic language it’s also a good practice to assert on certain potential problems stemming from the dynamism (and IMO my above example is an okay demonstration of one such case).
So a strong static typing would eliminate the need for certain double-checks, let’s say. I still view that as a win. How big – that really depends on team dynamics, culture and practices – but a win nonetheless.
Also what @stefanchrobot said, with three hands raised. Many people cannot assess and positively evaluate something that removes certain problems and annoyance from appearing in the first place. Case in point: you are not thanking your trees for capturing CO2, are you? Well, maybe every now and then you should.
Strong static typing helps you avoid a class of problems early.
100% correct! That’s why many people make the very valid and strong argument that being able to deconstruct a function’s input arguments via pattern matching removes most of the benefits of strong static typing. I don’t disagree with that argument – I like it and I support it – but I still think a bit more static typing will help. It will just help other things, not that one thing in particular.
This is why I just don’t bother with unit tests anymore (though I do like doctests). When I know what my small functions look like, the only reason I would write tests for them in the first place is to help guide their design. If their design is obvious, I don’t get any sort of nagging feeling of it not having a unit tests as I know my integration tests are exercising them. I fully sympathize with people that still want to write these tests, though. I do keep some around for more complex functions.
Not sure if you are talking about in general or in terms of types or if I even fully understand your ask, but I make a configurable macro in these cases that writes a test for me. Not sure if you know Rails but they are akin to RSpec’s “shared examples”. This way I don’t need to create a dummy implementation just for testing and I can throw a one-liner in each implementation’s tests that basically just catch regressions. Again, maybe that’s not what you meant.
I certainly agree that testing in Elixir is a huge step up. Well, I only really know testing in Ruby and JS otherwise but LiveView tests changed everything. They are one step away from being e2es that are simple to write and super fast to run. Then you just need to sprinkle in a few Wallaby tests if you really want to be thorough.
Lol, ya, I remember those from Ruby libraries. I just haven’t come across it much in Elixir due only working on a production system with actual users for about 8 months (and I didn’t really handle any library config).
I definitely see this, though if a library is validating config I see it as less of a problem. The tests in the library themselves would only really need to test the happy path too, I feel. I could be wrong. Other than messing around with OCaml and Haskell I don’t really have experience with types (which I’ve probably told you several times before, lol). “Team dynamic, culture, and practices” obviously play a huge part in but I don’t want to get too into discussing the general value of types again!
I write tests for the possibility of reducing the risk of change in the future. I found that forcing myself to use tests to drive design sometimes was a huge hindrance to my creative thought.
I often writes tests last now, I know I skip a lot of the “is the test even valid” by skipping the red phase.
Instead I often write the code I wish I had (some times write tests in between). Flush out the logic to support, then harden it with tests. If I refactor its more a bigger rewrite. That’s not to say this is a good workflow but it works for me.
The key to not having tests hinder design is to do “spikes”, ie, prototypes you throw away. The most important design factor TDD helps me with is, as you said, boundaries at the integration layers. In any event, I don’t judge the way people do things too much—unless you are on my team But there I just want people to be on the same page and it’s chaos when a single team is using different methodologies.
Of course this is an interesting take that I now see @dimitarvp shares. What is the context of this? Like you don’t want people to go on massive pet refactors? You don’t want features too change too often? It’s not a sentiment I’ve thought about before. Oooooops, I totally misread that
I have got jobs before that I only got because there was acceptance tests that had been used to rewrite the whole app in liveview while having its api migrated over from a ruby stack more than once now.
That’s also in part why my workflow is normally the way it is, that in there tends to already be tests