… imo, only for testing pure functions without side-effects. Yes, that’s often a non-trivial amount of code in a functional program, but in Elixir programs there is often just as much (if not more) that is run in processes which interact via messaging. Once you have modules that are modeled as processes, the game changes quite a bit as the possibility of deadlock, race conditions, wrong message shape, etc. all start to arise.
GenServers typically provide public functions that perform the actual call/cast as the lone API to be used by other modules. MyModule.perform_some_calculation(input)
is likely to have a single line body of GenServer.call(__MODULE__, {:perform_some_calc, input})
or something similar. If any of the functions which are called during the handle_*
function end up making a blocking call to the same process (directly or via a call-cycle), this will not be caught during testing if the processes are mock’d out or just not running.
Similarly for the message shape used in calling into processes. If the message-calling API and handlers are not tested, and tested together, they easily fall out of sync. This is easier when they are kept close (as in the GenServer example above) but rather harder as an application grows and potentially even splits into several libraries.
One either gives up on testing much of the application, leaving testing to only cover a portion of the application (which, admittedly, is one possible strategy), or tests need to be run with processes running and making the actual calls they would make when run “for real”.
Testing fictions is harmful in the same way as wearing a bike helmet that is not fastened properly is. It makes you feel safer (which actually leads some people to take more risks while riding) but doesn’t really offer the safety it is intended to. Why bother.
I feel that much of the received wisdom around unit testing, mocking, etc. comes from a much simpler time when programs were linear and synchronous. Elixir does a great job of hiding the asyncronicity and threading models underlying the BEAM, but it does not get rid of the dynamics of them when it comes to deadlocks, race conditions, etc.
The example @mhanberg gave is a classic example.
Reminds me of this blog entry: Unit testing anti-patterns: Structural Inspection
It is not quite the same thing, but suffers from similar challenges. In the db example you gave, it is only testing what the developer thinks they’ll get back from the database. It does not actually demonstrate that this is what the database will return. Tests that return what you already think should happen only test your mental model of the software you’ve written; they don’t actually test the software you’ve written.
This is not academic. If you pass a ‘raw’ ecto schema struct to a json library to serialize into jason, it will typically fail. The type of failure will depend on which json library the program uses. It may also leak implementation details if associations aren’t preloaded … if the developer forgets any of that, it is likely that they will also forget to create an accurate mock. Their tests will pass, they will be happy, then they will actually run it and get failures.
Should the database schema change, tests will either continue to work when they should not[1] or the mocks need to updated manually, often taking time and frustrating developers, only to create a new fiction that is just waiting to break again when reality once again changes.
Errors … what is the shape of the error response when fetch(account_id)
fails in one way or another? Yes, you can run failing queries by hand to see what the response and then extend your mock further, or you could just use the actual db library backed by an actual db and let it return the actual error codes that your application can receive at actual runtime. The result: actually useful tests.
It is not enough to just know that the “right calls” are being made if they “right calls” work differently when actually called. Tests should prove the developer’s mental model of the software, not tautologically proving that the developer’s expectations (the tests) are the developer’s expectations (the mocks they wrote for the tests).
Mocking absolutely has its uses, but faking data from easily modeled[2] local sources of equivalent truth is one of the worst imho.
I would rather have fewer tests and have code coverage show that certain parts of the code base need to be tested manually, than have the false sense of security of code coverage that is based on fiction.
I know this is not a popular opinion in many software dev circles these days, but the number of useless and even harmful test suites I have come across over the years has led me to reconsider the “common wisdom” on this matter.
[1] When the database is managed by someone other than the developer writing the application code that uses it, such issues become more frequent.
[2] Ecto gives us migrations and that ability to do db seeding.