Why is my genserver reseting its state between calls in the same test

I’m using the ideas from this post to setup my tests

This is my start_link

  def start_link(init, opts \\ []) do
    name = Keyword.get(opts, :name, __MODULE__)
    GenServer.start_link(__MODULE__, init, name: name)
  end

And I have this setup on my tests

  setup do
    child_spec = %{
      id: :test_store,
      start: {Mini.Store, :start_link, [%{}, [name: :test_store]]}
    }

    start_supervised!(child_spec)
    :ok
  end

This is the failing test,

  test "insert multiple values under same key" do
    Mini.Store.put(:key, %{my: :map}, name: :test_store)
    Mini.Store.put(:key, %{other: :map}, name: :test_store)
    assert Mini.Store.get(:key, name: :test_store) == [%{my: :map}, %{other: :map}]
  end

When I run the assertion I only get the second map [%{other: :map}]
At first I thought it was a bug on my code, but when I try it on iex it works as expected.

This is the put and get implementations, its very ugly dont pay attention to that, I just want to point out that I get my server name from the opts param.

  def put(key, value, opts \\ []) do
    sleep = Keyword.get(opts, :sleep, 0)
    name = Keyword.get(opts, :name, __MODULE__)

    existing = get(key)
    concat = existing ++ [value]
    new_list = Enum.reject(concat, fn x -> x == nil end)
    dedup_list = Enum.dedup(new_list)
    GenServer.cast(name, {:put, key, dedup_list, sleep})
  end

  def get(key, opts \\ []) do
    timeout = Keyword.get(opts, :timeout, :infinity)
    name = Keyword.get(opts, :name, __MODULE__)

    try do
      GenServer.call(name, {:get, key}, timeout)
    catch
      :exit, value ->
        {:not_ready, value}
    end
  end

Again, on iex it works as expected

iex(1)> Mini.Store.get(:key)           
[]
iex(2)> Mini.Store.put(:key, %{my: :map})    
:ok
iex(3)> Mini.Store.put(:key, %{other: :map})
:ok
iex(4)> Mini.Store.get(:key)                
[%{my: :map}, %{other: :map}]

You’re using cast, which is creating a race condition between when you get and then put.

You would typically want to write this code as a get/set inside of the same message. That guarantees you won’t have a race condition.

Edit: this is not the problem because it’s sequential from the test process. Would need multiple callers in parallel for it to happen.

2 Likes

How would the race condition occur? Since both cast and call send a message from A to B and BEAM guarantees that messages from one process to another are received in order.

Perhaps if the GenServer B is still processing the previous cast when both the cast and call messages arrive in the inbox, and in GenServer’s code call messages are pattern matched first in the receive block. :thinking:

Another option is just to use call for put operations also. Usually you do care if your put operation on a store succeeded or not.

The weird thing is that if I put a IEx.pry and manually test it I have the same problem.

❯ iex -S mix test test/store_test.exs:37
Erlang/OTP 24 [erts-12.0] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Excluding tags: [:test]
Including tags: [line: "37"]

Request to pry #PID<0.391.0> at StoreTest."test insert multiple values under same key"/1 (test/store_test.exs:37)

   35: 
   36:   test "insert multiple values under same key" do
   37:     require IEx; IEx.pry
   38:     Mini.Store.put(:key, %{my: :map}, name: :test_store)
   39:     Mini.Store.put(:key, %{other: :map}, name: :test_store)


Allow? [Yn] 
Interactive Elixir (1.12.2) - press Ctrl+C to exit (type h() ENTER for help)
pry(1)> Mini.Store.put(:key, %{my: :map}, name: :test_store)
:ok
pry(2)> #wait
nil
pry(3)> Mini.Store.put(:key, %{other: :map}, name: :test_store)
:ok
pry(4)> Mini.Store.get(:key, name: :test_store)
[%{other: :map}]

Yeah, but this is just some code I’m using to understand genserver timeouts.
The idea would be to perform the put operation async and check if the insertion is completed later.

I’d assume that the server crashes for some reason and the supervisor restarts it so you only get the second put.

3 Likes

Re: your actual question - it’s unclear what’s wrong, because your handle_call and handle_info functions aren’t included. Try adding IO.puts everywhere until you understand the flow of execution.

This isn’t a cause of your current issue, but would cause disastrous results under parallel load because the state of the GenServer can change between the get and put calls. This causes “lost updates”, in the same way that doing a SQL read-modify-write access without a transaction can.

The standard idiom for solving that issue is to do the calculation inside the GenServer (in handle_call etc) - that ensures that no other code can modify the state while a put is running.

6 Likes

Thanks for all the replies.

Looks like it was a very strange race condition because after doing what @al2o3cr sugested here the problem went away.

I’m still not sure why I had a race condition even when manually calling the put inside a pry session on the test giving the server plenty of time between calls.

Time elapsed is no way to coordinate stuff. Centralizing access with GenServer calls – since there is always only one message processed at a time – is the way to go. Glad you solved your problem. :+1:

2 Likes