Integration testing is hard

For several years, I’ve found integration testing in Elixir to be very hard to do right and very painful. Now I don’t care much for integration testing but people always seem to want to test this way. They want to start a process and do some things to it, and then check for the side effects it causes somewhere else, probably several processes away.

For example, there is a process that handles user chat messages. The process also triggers telemetry based on the messages, and telemetry is captured by another process and queued up to be saved in an analytics database. So the test scenario is: start a chat process, send some messages, and ensure the database contains the correct statistics. I don’t think it is an especially good idea to test this way, but it doesn’t seem entirely unreasonable either.

But how do we do it? Mocking the repo or something near the database insert? Most of the mocking libraries are kind of iffy and can cause problems or don’t work async etc. Then there is Mox, but it only works with behaviors. So I’ve seen people add a @callback to a module, just so it can be mocked, there isn’t even a behaviour. I don’t like that because now we’re creating half of a behaviour just for tests. Additionally you need to put stuff in the Application env, causing clutter. That is two aspects of mocking with Mox that require adding test-only code in the regular (non-test) codebase. So I don’t want to do this because in most cases, other than Mox requiring it, there is no reason for adding a behaviour.

Then there’s the other obvious option which is add a lot of sleeps which is bad for obvious reasons.

Now we reach slightly more esoteric techniques like using :erlang.trace as described in e.g. this post. This is actually quite decent if you can use it. You find a process that is supposed to be called and ensure that it is in fact called, using assert_receive. If your process gets a lot of messages it can take a lot of time to get the right pattern for the assert because you have to fish it out of a very long message inbox printed in the terminal. I guess it’s actually only half-decent. And now the process is supposed to do a database insert. And database processes can’t be traced as easily because there is a pool of them. Back to square one.

Of course people are going to reply with stuff like “well in Javascript and Ruby you can just overwrite anything and that is bad because of reasons” and “in Java you have to have an IoC container and that is bad”. And the obvious “you are doing it wrong” / “just don’t test this way”. All I can say is, I respect and appreciate all the work various people have done to make testing in Elixir possible, and I like ExUnit, but in over 5 different programming languages I’ve used, these kinds of tests are the most painful in Elixir.

I guess it boils down to testing things that happen across processes is inherently hard. That’s why I try to avoid it, only test a single process / module as much as possible. But how do you convince other people to avoid these kinds of tests? In some cases they are not that hard to write, but you pay the price later anyway when you rewrite them and they are no longer easy.

3 Likes

Please take my post with a pinch of salt, as I’ve read 2 books on Elixir, but have no experience of it in production/OSS otherwise.

After painfully going through similar questions you’re raising, I’ve arrived at the following so far:

  1. Elixir processes are not OOP classes, so we can’t expect to test them the same way (e.g. create a structure of several objects, perform actions and observe side effects).
  2. Pure/functional Elixir code is easy to test.
  3. Mocking/stubbing actors is hard. Seen this both in Elixir and in Orleans on .NET.

Therefore, I’ve found it’s most practical to:

  1. Have as much logic as possible in the pure functional modules. Cover these with extensive unit tests.
  2. For process-level testing, start the application (like mix test does by default), and send test messages/observe side effects as the application is running. Most likely it means having a DB running as part of your build job etc.

#2 assumes that the processes/services are very slim and delegate all decisions to the pure code. That’s not always easy as messages that get passed around are usually intertwined with the business logic.

I’m not sure I’ve made my peace with the above approach yet, but this is what the toolset has been pushing me forward. Larger projects on Github that I’ve looked at (Phoenix, Nostrum) all seem to be following similar philosophy.

5 Likes

I think we’re in agreement. It’s best if you can cover every module separately and then all the separate test combined test the entire system.

With respect to running the database, that is not ideal but not that bad either. However, the challenge with that is when you have an interaction that goes across multiple processes; how do you know the DB driver has performed the insert you want to test, without a sleep?

In my experience, particularly if you’re doing an “integration test” you would not mock the database at all. Ecto sandboxes are designed to work with multiple processes. Start your processes you need for your test, put Ecto in either the shared mode or use allowances, and do a real “end to end” test.

4 Likes

I have a few principles I try to follow when testing processes / sets of processes:

  • Try to start processes in the test
    • isolated from anything else
    • no singletons or dependency on global state
  • Application started processes are global state (best not to be depended on, unless they provide isolation on their own (e.g. ecto))
  • Do not assert on processes internal state. Don’t look into the black box.
  • Assert on things observable to the outside world using public APIs of the processes or interactions with other public resources
  • Strongly prefer waiting on being messaged over sleeps.
  • Don’t forget that the test is an isolated process, which can receive messages.
  • If you want to test implemenation details used within a process, extract to a function and unit test the function.
  • Consider stateful property tests. They’re involved, but also powerful. Can the the right tool depending on the context.
  • Rearchitecting to aid testing is not a bad idea.

Imo this is a fallacy. The usecase for behaviours is to provide some level of interface where multiple implementations are to be used. If you have one implementation running, that’s hard to test, and a different one meant to aid in testing that’s multiple implementations. Even if an implementation happens to only be used in testing it’s just as bad if the implementation starts to drift apart with what is expected from the implementation as it would be for one used in production.

I think the general arguments about unit tests vs integration tests apply here as well.

3 Likes

Yes, and add sleeps to wait for the insert, which is undesirable. I don’t want to mock the database, I want to know when it has performed its tasks.

This looks like an actual fallacy namely circular reasoning:

  1. You need a behavior because without one, it’s hard to test stuff
  2. When it’s hard to test stuff, you need a behavior

Example

  • There is a process A that generates telemetry as a side effect
  • There is a module that subscribes to this telemetry and casts it to process B
  • Process B holds some state that can lead to events being filtered. This is strongly tied what events generated by A look like
  • Process B saves the events to a database

Of course I can test separately whether

  • A generates the correct telemetry
  • Telemetry leads to a cast
  • Make the functions in B public and test whether they result in the correct database records

That also means

  • Copying the telemetry content across tests, because we need to test whether A sends it correctly, and then use it as input for B
  • Making a factory for the above
  • Or more likely, certain events will not be tested fully

While your other points are well taken I don’t see a solution for an end-to-end test of the above that doesn’t involve sleeps.

Since when? Repo.insert is synchronous, when the function call is completed the record is in the DB.

No it doesn’t, Elixir behaviours are basically a form of programming by contract. They are not something that will ruin your life, they are used to (a) increase clarity on what does a certain agent in your program do and (b) help with mocking if you are so inclined.

2 Likes

That mention baffles me too, the only case where I would see uncontrolled concurrency happening is when somebody would use something like GenServer.cast/2 and this points to a bad design, as that function doesn’t guarantee that the message was received by the process.

Let’s try to maintain the level of the discussion. genserver cast is a normal OTP function which serves many purposes and is used very often.

Obviously if you are calling a function which calls insert directly or performs a cast which leads to calling it. But once there is an asynchronous step somewhere, whether insert itself is synchronous doesn’t matter

I think we should take a step back here because your comments strike me as a bit academic. Maybe we should discuss your particular hurdles when trying to test something concrete?

Discussions like these rarely are fruitful because everything carries tradeoffs – there’s no one single perfect solution. Though it’s also true there are a number of solutions that are objectively worse than others. If that’s your goal here – to uncover a subset of better solutions then OK but f.ex. arguing that “Repo.insert can actually be asynchronous” is not productive and won’t lead us to anywhere enlightening.

1 Like

This feels like two separate layers of testing to me but I just don’t have any experience with telemetry so I can’t answer intelligently. But it would seem to me you would want to test that a chat message triggers a telemetry event and then a general test that telemetry events are being consumed and saving to the db (and depending on how detailed you wanna get, you could test all your event names are valid).

EDIT: sorry @hkrutzer I realize now you essentially said this already.

This can and should be reformed into sending messages to dedicated OTP process(es) and just assert they are received. Persistence of telemetry is an implementation detail that has zero relevance to your app’s tests.

2 Likes

If process A does a cast to process B, then process A can afterwards call :sys.get_state on B and after that returns you are guaranteed that the cast has also been handled. Messages between two processes A and B are always received in the order they are sent, and so the GenServer.call done inside :sys.get_state will be processed after the cast.

Telemetry hooks are always fired from the the process that emits the telemetry event, eg the process calling Repo.insert. So as far as I can tell if your test pid calls a function that calls Repo.insert, and then that has telemetry cast to some other pid2, your test code should be able to :sys.get_state(pid2) and at that point you’re guaranteed to be after it has processed the cast.

4 Likes

That is kind of true, but the telemetry is transformed first, so there is an interdependency between the data across the modules and processes.

For example, you fire from process A some telemetry that contains a boolean value. You receive the telemetry in process B, and for some reason you need to transform it into a 0 or 1 and save it into your database. You test these things separately. Now if someone changes some code so that A no longer produces a boolean, but instead produces a string (or a 0 or 1, doesn’t matter). They also change A’s tests. All looks to be fine. But after deployment, B tries to consume the telemetry and now it breaks, because its format has changed.

You can imagine a similar case where data isn’t transformed in this way in B, but instead filtered, merged etc.

Thanks! I didn’t think of the :sys.get_state trick. That will probably solve at least this specific case!

Not entirely, I have one process (A) that fires telemetry, which is cast to another process (B) and B then performs the insert (and in some cases waits for multiple events, to merge them together). But your suggestion should still work if I get_state(B).

1 Like

I feel you on this view as I felt it at first, but the way the way I see it is that we create mocks for wrappers around services (if we’re following “don’t mock what you don’t own”). Even if we only end up having one production implementation, it’s still probably a good idea to define a behaviour. For example, perhaps there are some public functions added to an implementation that aren’t actually part of the contract. That’s the way I see it, at least, but I’ve been wrong a bunch over the past 24 hours :sweat_smile:

I haven’t followed the complete discussion, but to me this sounds like the sole reason you have a hard time testing this. Instead of waiting for process A to generate telemetry can you make it do that on command? If so your test can command it to do what it does and you can expect all the downstream effects to happen and assert on the resulting db records.

1 Like