How to test a supervised GenServer?

Hello. I have a GenServer working correctly. All the test are passing.

This is how I have it defined:

defmodule CashierGenServer do
 def start_link(arg, opts \\ []) do
    name = Keyword.get(opts, :name, __MODULE__)
    GenServer.start_link(__MODULE__, arg, name: name)
  end
  
   @spec add_to_cart(product_code :: String.t()) :: :ok
  def add_to_cart(product_code) do
    GenServer.cast(__MODULE__, {:add_to_cart, product_code})
  end
  
  ... # a lot more client code, that resembles this
  
  def init(_) do
    state = %{total_price_expected: 0, basket: [], inventory: []}
    {:ok, state, {:continue, :fetch_inventory_from_fake_database}}
  end

  def handle_continue(:fetch_inventory_from_fake_database, state) do
    {:noreply, %{state | inventory: Inventory.retrieve_inventory()}}
  end
  
  ... # a lot more server code but more of the same
  
 end

My tests are defined like this:

defmodule CashierGenServerTest do
  use ExUnit.Case, async: true
  doctest CashierGenServer
  import CashierGenServer

  setup do
    start_supervised(CashierGenServer)
    :ok
  end

  test "application starts with an empty cart and zero initial amount" do
    assert consult_cashier_info()[:basket] == []
    assert consult_cashier_info()[:total_price_expected] == 0
  end
  
   test "application starts loading the products inventory" do
    assert consult_cashier_info()[:inventory] == Inventory.retrieve_inventory()
  end
  
  test "add_to_cart/1 allows adding products to the cart" do
    product = Fixtures.product_fixture(["GR1"])

    assert consult_cashier_info()[:basket] == product
  end

  test "remove_from_cart/1 removes one unit of a product from the cart" do
    [product] = Fixtures.product_fixture(["GR1"])
    remove_from_cart(product)

    assert consult_cashier_info()[:basket] == []
  end
  
  ... # more tests
  
end

AGAIN, ALL OF THESE TESTS ARE PASSING WITH NO ISSUES

But if I supervise this GenServer my tests wonā€™t pass. Some will pass, some will not. Itā€™s like the previous state remains in the test.

This is how I supervised it.

defmodule CashierGenServer.Application do
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      CashierGenServer
    ]

    opts = [strategy: :one_for_one, name: CashierGenServer.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

I noticed that if I do this on my test setup

  setup do
    Application.stop(:cashier_gen_server)
    :ok = Application.start(:cashier_gen_server)
  end

my tests will pass.

  1. Is this the correct way?

  2. If not, what do you recommend me to properly test a supervised GenServer?

3 Likes

Couple of thoughts on this.

  1. Although start_link looks for a :name option, youā€˜re not really using it in the one client API function youā€˜ve provided. I would assume this would cause an issue if youā€˜d actually start it with the option set; your client API calls would fail because the process could not be found.

  2. From the way youā€˜re starting CashierGenServer in your application, it is used as a singleton - is this intended? Iā€˜m asking because it seems like something that should be started per user/session? In that case, take a look at DynamicSupervisor. When using DynamicSupervisor,
    you could spin up a supervised CashierGenServer process just for the test module (make sure to stop it via on_exit).

  3. Some of your tests do not test CashierGenServer directly, but via the Inventory module. I generally like to avoid coupling tests like that. If you want to test that Inventory calls CashierGenServer, you could use a mock (this depends on how the two modules actually interact though). But that would be part of the test suite for Inventory.

4 Likes

Realized my reply may not have actually answered your immediate questionā€¦

Generally, thereā€™s nothing wrong with testing a supervised GenServer like this (above caveats still apply). I wouldnā€™t say it matters if itā€™s supervised or not - aside from testing the cases where it might crash, to make sure it gets restarted etc. You just have to be aware of how many instances of it are running and how to reach them.

The reason why your tests pass in the first case is that start_supervised stops the CashierGenServer before the next test runs. So for every test case, you have a fresh instance of it.

Once you add it to your applicationā€™s supervision tree, you already have one instance of it running before the test suite even executes, so the call to start_supervised should return {:error, reason}, reason being something like ā€œalready startedā€ - because an instance of CashierGenServer with the name :Elixir.CashierGenServer already exists. You can verify that by using start_supervised! instead, which would raise the error. As it is now, this specific error just goes unnoticed. You mentioned that itā€™s like the previous state remains in the test - thatā€™s exactly what happens! No new process with the same name can be started and the tests go ahead using the one thatā€™s still running.

You could avoid this by creating new CashierGenServer processes passing in the :name option, but then your client API functions would fail (see point 1. above). And, again, I donā€™t think this is what you actually want?

Finally, your workaround makes the tests pass again because you force the same initial condition to apply - before each test runs, CashierGenServer is stopped (if itā€™s running) and a new one is started.

So, how to fix this?

First step could be to make the errors with the current setup obvious. Make sure the process is actually started by pattern matching on the returned value:

setup do
  {:ok, _pid} = start_supervised(CashierGenServer)
end

If you actually do intend for your application to be able to create multiple CashierGenServer processes and make the changes to use DynamicSupervisor, you could also update your client API functions to accept some value that you can use to identify the correct process (simplest way being the pid):

setup do
  # create a new process for each test
  {:ok, pid} = start_supervised(CashierGenServer)
  # pass it to the test
  %{cashier_pid: pid}
end

test "some test using its own cashier process", %{cashier_pid: cashier} do
  CashierGenServer.some_api_function(cashier, "a parameter")
  ...
end

Where the implementation would look something like:

defmodule CashierGenServer do
  ...
  def some_api_function(pid, param) do
    GenServer.call(pid, {:some_api_function, param})
  end
 
  def handle_call({:some_api_function, param}, _from, state) do
    # do something
  end

Hope this helps further.

4 Likes

Thanks @awerment, I will analyze and try to understand everything youā€™ve told me here. Iā€™ll let you know when I get to a solution :fist_right: :fist_left: :pray:

2 Likes

Note: actually starting processes during testing often complicates things enormously and you end up testing library code (like GenServer) more often than not. I usually declare the client-side methods ā€œtoo simple to testā€ and just test the server-side methods directly (test whether handle_cast and handle_call methods have the right behavior - thatā€™s a ā€œproperā€ unit test and IMO the most useful level to test on)

4 Likes

Iā€™m only now learning to test GenServers so this is interesting. Iā€™ve only written a few tests so far for a very simple GenServer and am running them supervised.

Do you have any examples of how it gets hairy?

Iā€™ve been thinking about it but it still feels off to me to be testing the callbacks. Not testing the client API gives no protection against regressions. Like, sure you can mandate that the client functions are pure wrappers, but there is always the possibility that something slips into one of them. Do you just rely having them exercised through end-to-ends? Am I missing something else?

Thanks!

1 Like

I can relate. That was my initial instinct too. I think it was the Programming Erlang book that helped me shift my thinking. In that book, Joe Armstrong builds an example server process from low-level primitives. Then he splits it into the generic server and the callback module pieces to show how/why GenServers are designed as they are. Once I understood that well, I recognized that the OTP team has tested the GenServer internals well. Itā€™s framework code. I generally try to avoid testing other framework code and focus on whatā€™s special/specific to my application. That is what the application callback module is to the GenServer ā€“ your code. Your callback moduleā€™s contract with GenServer it tied up in the function names/arities you define and the particular tuples they return. But itā€™s still just a module with certain public functions.

Once I accepted this, invoking the callback moduleā€™s functions directly became a viable test strategy for me.

7 Likes

Thatā€™s an interesting take. I totally agree that testing the callbacks in this case feels like the proper way to unit-test.

As a counter-example, Iā€™ve landed on another approach (often advocated by @pragdave): separating the core logic thatā€™s usually put inside the callbacks into a separate module that only has pure functions with state in ā†’ response and new state out. That way the API as well as the callbacks can be considered ā€œtoo simple to testā€ (Iā€™m sure many will disagree, as will most code coverage tools :sweat_smile:).

To provide a concrete (although way too simplified for sure) example:

You start out with a ā€œrootā€ level API module that the client code calls into exclusively:

defmodule Echo do
  alias Echo.Server

  defdelegate start_link(init_args), to: Server
  defdelegate ping(pid), to: Server
  defdelegate pong(pid), to: Server
  defdelegate count(pid), to: Server
end

The Echo.Server module holds the GenServer implementation:

defmodule Echo.Server do
  use GenServer
  alias Echo.Impl

  @me __MODULE__

  # API

  def start_link(init_args), do: GenServer.start_link(@me, init_args, [])
  def ping(pid), do: GenServer.call(pid, :ping)
  def pong(pid), do: GenServer.call(pid, :pong)
  def count(pid), do: GenServer.call(pid, :count)

  # Callbacks

  def init(init_args), do: {:ok, Impl.init(init_args)}

  def handle_call(:ping, _from, state) do
    {response, new_state} = Impl.ping(state)
    {:reply, response, new_state}
  end

  def handle_call(:pong, _from, state) do
    {response, new_state} = Impl.pong(state)
    {:reply, response, new_state}
  end

  def handle_call(:count, _from, state) do
    {:reply, Impl.count(state), state}
  end
end

(In this over-simplified instance, you could of course do away with the extra layer of API functions and just call into GenServer from the main Echo module.)

Doing this, your core logic (air quotes) can be as simple as:

defmodule Echo.Impl do
  def init(_init_args), do: 0
  def ping(num), do: {:pong, num + 1}
  def pong(num), do: {:ping, num + 1}
  def count(num), do: num
end

And testing it becomes very straight forward - no need to deal with runtime concerns at all. Also, using something like boundary, you could enforce that the client code can only interact with your API module.

As mentioned above, this of course leaves quite a few untested LOC as per some metrics, but you gain a clean separation of concerns and it makes refactoring easier, e.g. replacing GenServer with Agent, changing and maybe more importantly reusing the implementation.

(Edit: just noticed I had a small typo bug in one of the callbacks - count that as a +1 for integration testing :man_shrugging:)

I am with you here, just want to point out that to me testing my GenServers does not become superfluous because of that distinction. Rather it falls into the category of end-to-end / integration testing, and that is plenty valuable because your code could still make an out-of-order messaging error.

1 Like

I completely agree. Thereā€™s value in both types of tests for different purposes. I donā€™t want people to feel ā€œdirtyā€ if they call the callback functions directly for the unit level tests.

2 Likes

I think I didnā€™t express properly why it feels off. I actually did consider testing the callbacks as well, but since I also wanted to test the api, it was totally unnecessary. I actually like, from a unit level, of just testing the callbacks as it does simplify the tests. However, I would still want my client api exercise somewhere, even if implicitly by its consumers, for regression purposes. Are you advocating against even that? If not, donā€™t you have the same problem with having to start gen servers in tests? Or would you use a mock or DI? Did I just answer my own question? :laughing: Even if I did, isnā€™t this still a problem in end-to-ends?

1 Like

I used to advocate strongly for specific meanings of ā€œunit testā€, ā€œintegration testā€, ā€œfeature testā€, etc. However, Iā€™ve given up on that now. Thereā€™s too many ways that people understand the nuances of each. What I find most useful now are ā€œfast testā€ and ā€œslow testsā€. Even that distinction has a gray area in-between :man_shrugging:

When I need a test to cross process boundaries, thereā€™s no single technique that works in all cases. (Note: testing the client api functions in a GenServerā€™s callback module involves 2 processes, even though it is a single module). What I find helpful in many cases though, is to let start_link take optional values for a name and any collaborating processes it communicates with. Yes, itā€™s a compromise to add that in production code to only be used by tests, but Iā€™ve come to accept that. With that in place, you can start a custom instance of the GenServer named according to your testā€™s name. No other code will be communicating with it. You can also mock or DI the collaborators (if any). Itā€™s more effort and it still has async ā€œgotchasā€ but it keeps the GenServer under test all to yourself.

Real end-to-end tests are gonna be slow and youā€™re always going to risk interaction between tests unless you serialize them.

2 Likes
def some_method(pid, arg) do
   GenServer.call(pid, {:message, arg})
end

That code I call ā€œtoo simple to testā€ these days. So yeah, Iā€™m fine skipping it. Iā€™d rather keep my GenServer module small and simple so that in a glance you can see that the call patterns and the handler patterns match (extracting all the hard work to a separate module, test that), than force myself to jump through hoops to achieve some random level of test coverage. Of course, technically, that code accesses a global value and you can make it properly ā€œunitā€ testable by injecting the module. But Iā€™m usually like ā€œmeh, not worth itā€. By the time my code hits production Iā€™ve done some smoke testing in a pre-prod environment anyway and errors between the above code and, say,

def handle_call({:mesage, arg}, _from, state) do
  do_stuff(arg)
end

(note the typo) will very likely be caught in some very mild acceptance testing or in the code review. Iā€™d like to focus my testing effort on things that can actually cause bugs.

(In the six years or so that Iā€™ve coded in Elixir full-time, Iā€™ve gone through several iterations and decided that especially in Elixir, code that is ā€œtoo simple to testā€ is a thing. YMMV)

2 Likes