How to test a GenServer?

Fl4m3Ph03n1x · March 22, 2019, 2:02pm

Background

Let’s assume I have a typical GenServer that receives messages as requests, does some operation in a DB and returns responses.

I want to know what are the best practices for testing GenServers (or Agents) in Elixir (if there are any) and which tools people use to do so.

My approach

If we follow the accidental Actor’s model Elixir and Erlang have implemented, then testing would be straight forward - simply make sure that your Actor sends the correct amount of messages to it’s collaborators once it gets a request.

Problems with my approach

Even ignoring the fact that the actors model is quite accidental in erlang/elixir and therefore lacks some of the tools to inspect communications (as in you’d have to create them yourself) this approach has one big problem: it would require me to follow the actors model.

This may seem contradictory, but if we follow such a model strictly we will end up using processes to save and manage state, instead of using them because of the run-time benefits they can offer. This, is an issue.

However, I don’t know of a better idea to test GenServer. How do you do it?

blatyo · March 22, 2019, 2:36pm

I’m not sure if I fully understand your question. But here’s my attempt anyways.

When I’m testing GenServers, I just call the callbacks myself. So there’s generally no process involved other than the testing process.

Here’s an example of that style of test:

github.com

conduitframework/conduit_sqs/blob/master/test/conduit_sqs/poller_test.exs

defmodule ConduitSQS.PollerTest do
  use ExUnit.Case, async: true
  import Injex.Test
  import ExUnit.CaptureLog
  alias ConduitSQS.Poller
  alias Conduit.Message

  describe "init/1" do
    test "sets itself as a producer and stores it's state" do
      queue = "conduitsqs-test"
      subscriber_opts = []
      adapter_opts = []

      assert Poller.init([Broker, queue, subscriber_opts, adapter_opts]) == {
               :producer,
               %Poller.State{
                 broker: Broker,
                 queue: queue,
                 subscriber_opts: subscriber_opts,
                 adapter_opts: adapter_opts

This file has been truncated. show original

And the module it’s testing:

github.com

conduitframework/conduit_sqs/blob/master/lib/conduit_sqs/poller.ex

defmodule ConduitSQS.Poller do
  @moduledoc """
  Handles demand from Workers by polling an SQS queue for messages
  """
  use GenStage
  import Injex
  require Logger
  inject :meta, ConduitSQS.Meta
  inject :sqs, ConduitSQS.SQS

  defmodule State do
    @type t :: %__MODULE__{
            broker: module,
            queue: String.t(),
            subscriber_opts: Keyword.t(),
            adapter_opts: Keyword.t(),
            demand: pos_integer
          }

    @moduledoc false

This file has been truncated. show original

jola · March 22, 2019, 5:24pm

I agree with @blatyo, test the callbacks or module code directly.

However, if you need to control the GenServers in your tests, eg if they keep state and you want to reset it between tests, there’s start_supervised!, which handles starting and closing the processes between tests.

Qqwy · March 22, 2019, 10:12pm

Elixir and ExUnit have quite a number of tools available to them to inspect communications. It is very common to test GenServers by performing a call or cast (or a logical sequence of these) to them, and check the message inbox of the testing process for the results using for instance ExUnit’s assert_receive.

So:

Test the stateless parts of a GenServer by calling the callback-functions directly.
Test the stateful interaction with a GenServer by starting it and writing a slightly higher-level test that performs one or a couple of calls/casts to this GenServer before checking the results.

Virviil · March 23, 2019, 7:31am

From the general testing approach, testing callbacks seems to be incredibly bad idea.

When writing unit test, one should understand what is unit. From the point of OTP view, single GenServer is single unbreakable unit - a kind of black-box, which is receiving and sending messages, is holding state and is doing the job.

By the way, it’s the same description as for a class instance in OOP language.

Have you every heard, that instance methods are tested without instance? No, of course. So, why do you think you can break entire GenServer implementation? How can you call this kind of testings - sub-unit, half-unit, under-unit?

Now: when you see that you can’t test GenServer as a black-box -what does that mean?
It means that your implementation is very tight coupled.

While in OOP world, you think a lot about dependency injection, composition, and all this kind of stuff - why you forget about them here? Because it’s functional language?

Now, as you see, your question is not about how to test GebServer but about how to create TESTABLE GenServer. The answer is simple, because OOP guys already know the answer!

GenServer callbacks IS wrapper code, adapter for OTP stuff. This adapter should adapt another module, who is implementing real business logic and does know nothing about OTP. This module should know nothing about he’s called via messages, callback or anything like this. Think about it as about pure functional module. For sure, you know how to test it?)
But we still need to test our GenServer communications. So, what to do? We should replace our tight copling with loose coupling. By dependency injection of course! Do you remember pure functional module on the previous step? It has interface that in Elixir world is called behaviour. Make your GenServer call not a specific module, but any module that implements this behaviour.
Mock all heavy calculations using new defined behaviour. You already know, that they are working - they are tested inside your pure functional module. You just need to test cross-process communication now. Use Mox library from Plataformtec for these needs.
???
Profit! You have fully tested loose coupled system!

Enjoy)

blatyo · March 23, 2019, 4:48pm

To be clear, I think the things you’ve suggested are useful techniques. I think the way you’ve written it implies that it is the only “right” way and that it is comprehensive in solving all problems related to testing GenServer’s. That might be a mischaracterization of your intent and if so, I apologize.

I’ll disagree with this for two very specific reasons.

The first is, some processes are never called by any other process. Instead they call themselves periodically based on their state. For those, they generally only have handle_info calls. I think it is perfectly acceptable to call them directly.

The second is, when you have many processes communicating with a process it introduces non-determinism. That non-determinism, makes it incredibly hard to test particular types of situations. For example, imagine I have 3 processes, A, B, and C. A and B, both send messages to C. If the messaging cadence is A, A, B, B, everything works fine, but when it’s A, B, A, B, things will break. How should I test this scenario. If I’m depending on sending messages with actual processes, I have no guarantee that the messages will be sent in the desired order. However, if I call the callbacks directly myself, I do:

{:ok, state} = C.init(opts)
{:noreply, state} = C.handle_info({:blah, A, 1}, state)
# assertion about state
{:noreply, state} = C.handle_info({:blah, B, 1}, state)
# assertion about state
{:noreply, state} = C.handle_info({:blah, A, 2}, state)
# assertion about state
{:noreply, state} = C.handle_info({:blah, B, 2}, state)
# assertion about state

You seem to be arguing here for something like the following:

defmodule MyGenServer do
  @pure MyPureFunctions # pretend this gets injected with something like mox

  def init(opts) do
    {:ok, @pure.init(opts)}
  end

  def handle_call(:blah, from, state) do
    case @pure.blah(state) do
      {:now, response, state} ->
        {:reply, response, state}
      {:later, state} ->
        Process.send_after(self(), {:blah_for_real, from}, 200)
        {:noreply, state}
    end
  end
end

I wrote handle_call in a specific way to illustrate some points. One is, if you keep @pure pure, you necessarily keep some logic in the GenServer itself that needs to be tested. You didn’t necessarily say this, but I feel it’s implied that, if you pull out pure state calculations it means your GenServer is dumb. That would clearly not be the case in every situation.

Second, @pure.blah/1 in my example is aware that it needs to split its answer into two parts. The response and the state. This implies some knowledge of the calling context. On the other hand, it could return a single value that contains the state and the response. That then moves some logic into the GenServer, which adds something to be tested in the GenServer itself. The now and later results are also allowing the pure part to decide how the GenServer should respond. Perhaps the GenServer could instead infer that from the state or response, but that also puts some logic back into the GenServer. In any case, I feel it’s optimistic to assume you can always avoid knowledge of the calling context.

Third, GenServer’s often do things that are impure, like sending messages to other processes. If your state or response is derived from calling another process, you could presumably mock that process. However, in order to simulate the statefulness of that process you’re mocking out, you’d necessarily need another process.

I personally feel that the testing strategy most people use for their processes is overly simplistic. They’ll send one message to the process or only have one process communicating with their process. There are certain types of bugs that only occur after a chain of messages have happened or when multiple processes are communicating with the same process. The only way I have found to deterministically test these situation is by calling the callbacks directly and making assertions about their result and side effects. I’m woefully unaware of property testing, specifically stateful property testing, which I imagine would allow you to discover these situations exist, though, not necessarily deterministically reproduce them.

I do also want to make clear, I don’t think calling callbacks is the only way to test a GenServer. However, when I do use messages, they tend to be integration tests or the GenServer is simple enough that I’m sure determinism won’t be a problem.

spencerdcarlson · March 3, 2021, 1:10am

Do you have a link to an example of this kind of testing on a GenServer? I’d love to see a distilled pattern if possible. The Mox examples are pretty nice.

Sebb · March 3, 2021, 5:08am

Me too.

I’m starting with Elixir (and functional programming). At first I did’t think too much (actually not at all) about testability and I read somewhere to use as much processes as possible. So I did that and it was fun until it wasn’t. Then I read some about functional programming and moved as much as possible in small pure functions. This was obviously a big step forward. My code is now divided in Rivendell (the pure, clean, easy to read, understand and test part) and … Mordor (the processes with all the side-effects and ugly stuff, there is even something involving timers).

I tried some things to test them, but its all not feeling good. And, because the processes are so small, they just work and I’m convinced that just by looking at them I can proof them correct. But now one of them does not work. So I’m back at this testing GenServers problem.

So I’d really like to have a guide how to build a testable GenServer and how to test it.

ityonemo · March 3, 2021, 5:21am

I think you should be writing as few genservers as possible. IMO For 90% of webapps, you should not have ANY GenServers (besides the ones that your framework/library) gives you.

When I write GenServers, they are either connections or a representation of something IRL, and are basically a “state cache and smart management layer”. But to specifically test the GenServer, I literally build one out, and “do things to it” and make sure they do what’s expected erps/erps_test.exs at master · ityonemo/erps · GitHub. If you need to probe the state, then you can include a “dump” call which will dump the internal state. If you need lots of these tests, it may be worthwhile to use a setup block where you’re starting the GenServer. If they need to handle failure, I write “OTP tests” where I kill the GenServer and make sure it comes back in an expected state erps/otp_test.exs at master · ityonemo/erps · GitHub.

Sebb · March 3, 2021, 8:18am

Yes, I’ve learned that the hard way already. Also I think its not the point to have as few processes running as possible, but really to have as few GenServer modules as possible. So better a simple GenServer that requires more of its instances to be spawned than a more complex implementation that requires less spawned processes.

I’ll look into the GenServes I’ve built and check if they match this definition.

This solves one of the challenges I had with testing processes.
ExUnit can only assert_receive messages the test-process receives.
So hard wiring processes prevents testing (or makes it harder).
But injecting the receivers makes the code more complex and it may only be there for testing.
Do you think we should nonetheless always inject the receivers of messages the GenServer-under-test sends to make it testable. (And also take care that these messages are properly tagged, because the test will be confused if receiver-one and two (which both are self()) receive the same message.)

Again, changing the code for testablility. Most likely its worth it.

There are still some things I don’t know how to handle:

the GenServer …

calls a module thats pure and already tested. I do not want to test it again. So use Mox as porposed above?
accesses an external resource, an API maybe…
does something with date or time. I’ve done some work with a C-actor-framework. There testing time was easy, because timers work this way: arming a timer just puts a tuple {<actor>, <event>, <ticks>} in a list. There is a manager for these timers that itself receives tick-events from the hardware. On each tick it decreases the ticks of all timers in the list and if one becomes zero it sends <event> to <actor>. So in testing you can just take control over time, for example send 100 ticks to the timer-manager.

ityonemo · March 3, 2021, 8:44am

I dunno. It sounds like your system is still more complicated than it needs to be. I built a multi-datacenter virtual machine orchestrator in Elixir with very high test coverage and it didn’t have to do anything like what you are describing. The most complicated thing was figuring out how to shard Registries so that asking the registry to list would only show other genservers created in the same test.

If your GenServer is emitting a message, then you should probably implement some way of overriding the target and setting it to the test process. But for the most part, you should only be hitting the GenServer with calls, so returning the message is baked in to the protocol. IMO if you are casting to GenServer or using raw messages (send/handle_info) you’re probably doing something wrong unless you can justify why you really really need the cast or raw message. handle_info is strictly for when your GenServer is consuming another process’s message API that it has subscribed to.

calls a module thats pure and already tested. I do not want to test it again.

You should probably roll with it; if it’s pure then the result is deterministic. You don’t have to try all possible code branches inside the called module, just the code branches inside your GenServer code.

accesses an external resource, an API maybe…

Mox. Don’t forget to register an allowance.

does something with date or time

Always difficult.

I typically don’t like to use timers in my GenServers, if I need one I will reach for :gen_statem, or more likely, the opnionated :gen_statem library that I wrote called StateServer.

LostKobrakai · March 3, 2021, 8:56am

With processes I tend to test in stages:

If the internal computation is complex then have it be handled by another module with pure functions – test those.

Then testing the statefulness of a single process. Start one process per test and assert the interface of the otherwise black box – meaning test whatever the message interface of the process is without ever looking into the process’s state. This might need some accompanying code in the implementation to adjust things like timeouts, third processes being sent messages or access to test specific resources like ecto sandbox.

And the third layer is more integration test style for starting a whole assembly of processes and asserting on their combined public interface (likely still messages). Here individual messages between processes under test are no longer relevant. Only thing counting is that the whole system does what can be asserted on from the outside.

Sebb · March 3, 2021, 9:39am

Most likely.

I’ll look at my code and see where this is possible. Makes sense that this is easier.

Yes, it is. But sometimes its just needed. I’m implementing a connection-oriented protocol, which has to time out after some time. Does :gen_statem make testing timers easier in any way?

Thanks for your help, I’ll refactor my code and tests with the things I learned and will be back with easier code, more tests and more questions.

ityonemo · March 3, 2021, 3:38pm

Ah! A connection oriented protocol. Classic use case for GenServer. I recommend using Connection. Connection — connection v1.1.0

And if you want both tcp/tls i wrote Transport (though it’s not been tested with the new ssl 1.3) Transport — transport v0.1.0

Sebb · March 3, 2021, 4:36pm

Connection looks interesting, but it will not help, because I’m implementing a protocol stack (EN 50090 - Wikipedia), Connection builds upon an existing stack.

I just had a quick look over my Genservers and I already see some points I can simplify following your hints. I’ll be back later when thats done.

ityonemo · March 3, 2021, 8:17pm

Happy to help! If you prefer not making it public feel free to DM me, too. I’m in the process of making a youtube series on elixir concurrency, and i have deliberately pushed GenServers to the end, because they are tricky and I actively want to discourage people from using them, but it would be nice to know common things that people do that may or may not be the best.

Sebb · March 4, 2021, 8:54am

I could make the stack public, but only if I’m sure it’ll work out in the end.
Right now I’m not sure if I will complete it, I’m just looking into Elixir/BEAM to see what it can do in the embedded space. (preliminary result: it may reduce the tools and languages we use drastically)
I’ll make a minimal example out of the GenServer that is most complex.

rogerweb · August 2, 2022, 4:58pm

So, just double checking if a GenServer makes sense in my scenario:

I have a number of users connected to my application via websocket (Phoenix channels). A separated process consumes messages from AWS SQS (a FIFO queue) and sends each message to the target connected user. For each sent and received message the application receives an “ack” back in the channel. At that point I need to delete the message from SQS. I could call SQS’s API to delete the message right there, in-process, individually. However, it looks the best practice is to batch these deletes, sending up to 10 message IDs in one single request to SQS.

So I created a GenServer that receives a message ID via cast and appends it to a list. Once the list has 10 IDs or a timer expires after let’s say 1 second, it deletes the 10 or less messages in one go in SQS and empty the list.

Would you say it’s a fair case for a GenServer?

Is it available? I would like very much to watch it.

ityonemo · August 2, 2022, 5:22pm

It’s a use case for GenServers, but sounds like a better fit for Amazon SQS — Broadway v1.0.3
It’s usually better to use someone else’s GenServer

I didn’t get to GenServers in the series, but I did polish the series and give a talk on it: Isaac Yonemoto - $callers and $ancestors and Tasks oh my! - YouTube

rogerweb · August 2, 2022, 5:28pm

Good! Thanks for the feedback!

I do use Broadway, but unfortunately I can’t use its acknowledger because the acks are received in the channels processes asynchronously and not in Broadway’s callbacks (How to acknowledge a Broadway message asynchronously?)