Best strategy to test GenServers which use timers (sends_after)


I have a group of GenServers which form a “unit” together with its parent Supervisor. One GenServer holds the state, another one sends data to external services and a third one is responsible for terminating the whole unit after a certain amount of inactivity.

Currently, if there is no input for 30 minutes, the “Timeout” GenServer terminates the parent Supervisor and therefore the whole unit. However, if the unit is in pause mode, the 30-minute timeout gets ignored and a 24-hour timeout gets started which is just a fallback to eventually clean up the system.

Right now, that whole thing is “untested”, which simply means it has no automated tests. It works fine but it’s also not that complicated at the moment. It might get a bit more features in the future tho.

I would like to write some simple integration tests but I am unsure how to deal with the timing aspect.

One thing to note is that I can manually override the timers in the Timeout GenServer. That was needed to reinitiate them after a reboot (e.g. deployment). I am sure that comes to my advantage in regards to testing but I still have no idea how to best structure the test.

The quick and dirty way would be this:

  1. Start “unit”
  2. Send input to “unit”
  3. Manually override timeouts
  4. Process.sleep() in Test module
  5. Assert expected state

But using Process.sleep() in the test Module (even if it’s for 1 second) feels quite dirty.

Any other suggestions?

Instead of sleep/assert in the test monitor the supervisor of the unit and be notified when it stops. When the supervisor stoped everything should also be gone if properly linked, but you could also check that afterwards.

That is a good idea. In general, in can just check the root DynamicSupervisor for all its children to check if the Supervisor specific to the unit is gone. That would be step 5 of my provided example.

So, I should Monitor the unit Supervisor and use assert_receive instead of Process.sleep(), right? Is assert_receive blocking the process as well? If so, there is not a huge difference to sleep/assert and it exposes a bit more of the underlaying mechanics (since I assert for the Supervisor to shutdown instead of asking the system for the list of running units).

From the implementation I’d say it’s way different from the fact that you need to wait it’s not. But given that you want to test a timeout I’m not sure how you want to prevent waiting for things to happen. Also the monitor will send the test a message as soon as the supervisor exists, so no need to guess a sleep() duration and no need for sleeping longer than needed.

Ok, that’s a good point. I will go that route then.

I have been in this situation as well, and what I did was to:

  • Allow timers to be configured in some way (like e.g. using Specify. This was actually one of the reasons why I created that library).
  • In my tests, reduce the timeouts from the hours, minutes or seconds range to a couple of milliseconds
    • Keep in mind that the timeout should be long enough such that race conditions should not occur. In essence, it should be an order of magnitude larger than the time it takes to do the actual non-timeout work. But making it more than this one order of magnitude means that your tests will be slower than necessary.
  • Now in your tests, let your test listener sleep for a duration slightly longer than the configured timeout (or e.g. use assert_receive), and it will be able to properly test that the GenServer(s) time out as expected.