What are your thoughts on starting processes in test mode?

jeremyowensboggs · August 3, 2018, 3:28pm

In general, if we follow the default way of application setup, all of our processes are started in test mode the same way they are in dev and prod mode. However, from a unit testing point of view, that strikes me as incorrect. Shouldn’t we be initializing our processes prior to each test? I have started creating two start paths for my application - one for test mode that starts very few processes, and one for every other mode which starts the whole tree. In my setups, or individual tests, I start the required processes using start_supervised, often feeding in the required mocks for that process to do it’s thing.

What are all your thoughts on starting processes in test mode? Do you have to code in ways to clear or set their states prior to tests (special functions/messages meant only to be used by tests)? Do you ever run your application in test mode to do user tests? Or just exunit?

aseigo · August 3, 2018, 10:58pm

IMO: no. With exceptions

If you are testing the application, then having it run in a mode that is not usual means you are not testing reality. If you are manually setting up necessary processes in tests, then as the application develops / evolves you risk drift between the processes running during your tests and what the application actually does. Again, not testing reality.

And tests that don’t test reality are mainly useful for making developers feel good.

The exception, imho, are tests that do not interact at all with processes. Tests that strictly test specific side-effect-free functions not running in or communicating with other processes. In those cases the only impact is performance (having to wait for the application to come up for the tests to run), unless some of those processes go off and start processing (e.g. from a job queue) automatically, which may not be desired, but shouldn’t happen anyways in a well behaved application run in test mode.

Just my 0.02

jeremyowensboggs · August 6, 2018, 2:33pm

Are you suggesting that unit tests should not interact with the processes?

mhanberg · August 6, 2018, 2:38pm

I have run into a few cases where I have a GenServer that looks up some data from the database on init, so if I don’t selectively start the process during the test, after I have set up the test data, the GenServer won’t have the correct state for the test.

aseigo · August 7, 2018, 8:02am

No, I think the application should run as it normally does when tested, because then the tests are testing the actual application. That should be a tautology: we should always be testing reality. But TTD has gotten so twisted about that many test suites test a fantasy version of the code rather than reality. Mocking is my favourite example of this; while it has its uses where it is invaluable, they are far less frequent than one would think given how often it is used in day to day testing.

Testing fantasies is not useful.

So I would let the application fire up all its processes as normal. That is how one can catch race conditions, issues with supervision trees, incorrect interactions between modules/functions, etc. It’s a gradient from unit testing individual functions up to full-on full-system integration testing in a staging environment, and imho our daily test suites ought to be living somewhere in the middle of that gradient.

The exception is when testing pure functions in isolation. No side-effects and low-to-no interactions with other code. Then it doesn’t matter if Elixir processes are spun up or not.

aseigo · August 7, 2018, 8:04am

Then you have a race condition in your production code. You may never hit it due to your deployment strategy, wherein you are careful to always start your application in front of an already-populated database, but the day that assertion is not upheld (and there is no way to enforce it in practice) you will hit that latent bug.

I would treat that as a signal that loading that data on init() is wrong, and that it should be waiting for an ‘all clear’ signal from the database before loading its data.

cdegroot · August 7, 2018, 11:40am

It matches well how I usually test these days - the easy bits not at all, the hard bits with unit tests, and then just to make sure there’s no typo maybe one or two integration tests that check the “wiring” (which is often the easy bits). Having processes running for unit tests does not matter, and for the integration tests it’s required. I do, of course, often have some construct like:

defmodule SomeBusinessLogic do
  @persistence_mod if Mix.env == :test, do: MockPersistence, else: MySqlPersistence
  # or @persistence_mod Application.get_env(....)

  def do_some_calc(account_id) do
    account = @persistence_mod.fetch(account_id)
    # do some calcs.
  end
end

if I really don’t want to talk to the database, even during “integration” testing. The mock typically is able to report at what happened during the test so you can assert that the right calls got made.

aseigo · August 7, 2018, 1:02pm

… imo, only for testing pure functions without side-effects. Yes, that’s often a non-trivial amount of code in a functional program, but in Elixir programs there is often just as much (if not more) that is run in processes which interact via messaging. Once you have modules that are modeled as processes, the game changes quite a bit as the possibility of deadlock, race conditions, wrong message shape, etc. all start to arise.

GenServers typically provide public functions that perform the actual call/cast as the lone API to be used by other modules. MyModule.perform_some_calculation(input) is likely to have a single line body of GenServer.call(__MODULE__, {:perform_some_calc, input}) or something similar. If any of the functions which are called during the handle_* function end up making a blocking call to the same process (directly or via a call-cycle), this will not be caught during testing if the processes are mock’d out or just not running.

Similarly for the message shape used in calling into processes. If the message-calling API and handlers are not tested, and tested together, they easily fall out of sync. This is easier when they are kept close (as in the GenServer example above) but rather harder as an application grows and potentially even splits into several libraries.

One either gives up on testing much of the application, leaving testing to only cover a portion of the application (which, admittedly, is one possible strategy), or tests need to be run with processes running and making the actual calls they would make when run “for real”.

Testing fictions is harmful in the same way as wearing a bike helmet that is not fastened properly is. It makes you feel safer (which actually leads some people to take more risks while riding) but doesn’t really offer the safety it is intended to. Why bother.

I feel that much of the received wisdom around unit testing, mocking, etc. comes from a much simpler time when programs were linear and synchronous. Elixir does a great job of hiding the asyncronicity and threading models underlying the BEAM, but it does not get rid of the dynamics of them when it comes to deadlocks, race conditions, etc.

The example @mhanberg gave is a classic example.

Reminds me of this blog entry: Unit testing anti-patterns: Structural Inspection

It is not quite the same thing, but suffers from similar challenges. In the db example you gave, it is only testing what the developer thinks they’ll get back from the database. It does not actually demonstrate that this is what the database will return. Tests that return what you already think should happen only test your mental model of the software you’ve written; they don’t actually test the software you’ve written.

This is not academic. If you pass a ‘raw’ ecto schema struct to a json library to serialize into jason, it will typically fail. The type of failure will depend on which json library the program uses. It may also leak implementation details if associations aren’t preloaded … if the developer forgets any of that, it is likely that they will also forget to create an accurate mock. Their tests will pass, they will be happy, then they will actually run it and get failures.

Should the database schema change, tests will either continue to work when they should not[1] or the mocks need to updated manually, often taking time and frustrating developers, only to create a new fiction that is just waiting to break again when reality once again changes.

Errors … what is the shape of the error response when fetch(account_id) fails in one way or another? Yes, you can run failing queries by hand to see what the response and then extend your mock further, or you could just use the actual db library backed by an actual db and let it return the actual error codes that your application can receive at actual runtime. The result: actually useful tests.

It is not enough to just know that the “right calls” are being made if they “right calls” work differently when actually called. Tests should prove the developer’s mental model of the software, not tautologically proving that the developer’s expectations (the tests) are the developer’s expectations (the mocks they wrote for the tests).

Mocking absolutely has its uses, but faking data from easily modeled[2] local sources of equivalent truth is one of the worst imho.

I would rather have fewer tests and have code coverage show that certain parts of the code base need to be tested manually, than have the false sense of security of code coverage that is based on fiction.

I know this is not a popular opinion in many software dev circles these days, but the number of useless and even harmful test suites I have come across over the years has led me to reconsider the “common wisdom” on this matter.

[1] When the database is managed by someone other than the developer writing the application code that uses it, such issues become more frequent.
[2] Ecto gives us migrations and that ability to do db seeding.

cdegroot · August 10, 2018, 12:42am

The GenServer API functions and forwarding-to-business-logic implementation functions typically are “obviously correct” so I don’t test them. I test the underlying business logic which preferably is in a separate and purely functional module. Similarly, stuff that relies on some GenServer gets a mocked one. This way I keep my tests focused on the hard stuff and have a minimum of testing that requires interaction between multiple components. It requires a bunch of refactoring, but works for me. YMMV.

jeremyowensboggs · August 10, 2018, 12:58pm

I concur - I think you get a real benefit out of testing your genservers outside of a process - primarily robustness as you can focus in on the edge cases. Have to be careful about over-testing though.

That being said, I find aseigo’s arguments compelling, and I have implemented some integration tests that been very helpful. The integration code tests a whole lot more in a lot less lines than the unit tests, and caught a few order of operation bugs as well. However, it took a decent amount of time to set up as some mocks were required, and I have had one un-reproducible failure in CI (only done few builds there as well) - which is why I shied away from it originally. I’ll have to wait and see how it shakes out over the next few weeks.