Question on fundamental nature of how GenServers work

I have been fighting this for days and would appreciate some insight.
My mission is to set a timeout on each GenServer I create so that once a created GenServers pid wasn’t being called anymore it would automatically terminate itself.

I pasted my code below and the GenServers timeout is set to be 10 seconds.

Yet when I use code below, I create 2 GenServers by running the code in iex
–>

#####################
Stak.start_link(8)
Stak.start_link(9) # I wait 5 seconds to type this line
#####################

Yet 10 seconds after I type “Stak.start_link(8)”, both of the pid’s from the 2 lines I types terminate at the same time. Grrrr

Another mystery to me is if I type the lines:

#####################
Stak.start_link(8)
Stak.get_state(8) # type this line before 10 seconds pass
#####################

the pid from “Stak.start_link(8)” never timeouts or terminates, which I expected to happen 10 seconds after I typed “Stak.get_state(8)”

#####################################
######### My Code ###################
#####################################

defmodule Stak do
  use GenServer

  def start_link(event_id) do
    name = via_tuple(event_id)
    GenServer.start_link(__MODULE__, %{}, name: name)
  end

  defp via_tuple(event_id) do
    {:via, Registry, {Registry.Runningevents, event_id}}
  end

  def set_timer(event_id, timer_config) do
    GenServer.call(via_tuple(event_id), {:set_timer, timer_config})
  end

  def start_timer(event_id) do
    GenServer.cast(via_tuple(event_id), {:start})
  end

  def get_state(event_id) do
    name = via_tuple(event_id)
    IO.inspect(name)

    GenServer.call(name, {:get_state})

 
  end

  def get_pid(event_id) do
    [{pid, nil}] = Registry.lookup(Registry.Runningevents, event_id)
    pid
  end


  def init(state) do
    ###  this lets us call :time_out and terminate.  It will now terminate gracefully
    Process.flag(:trap_exit, true)
    IO.puts("The server Started")
    IO.inspect(state)
    {:ok, state, 10000}
  end

  def terminate(reason, state) do
    IO.inspect(state)
    IO.inspect(reason)
    IO.puts("The server terminated")

    IO.puts("#{__MODULE__}.terminate/2 called wit reason: #{inspect(reason)}")
  end

  ##################################   server api   ##################################

  def handle_call({:get_state}, _from, state) do
    IO.inspect("The state")
    IO.inspect(state)

    {:reply, state, state}
  end

  def handle_info(:timeout, new_state) do
    IO.inspect(new_state)
    IO.inspect("time out")
    {:stop, :timeout, new_state}
  end

  def handle_call({:set_timer, timer_config}, _from, state) do
    IO.inspect(timer_config)
    IO.inspect(state)
    {:reply, timer_config, timer_config}
  end
end

You should not set timeout in init only, You should use it everywhere (handle_call, handle_cast) because each message cancel the previous timer, so You need to pass it again.

Something like this

@timeout 6 * 60 * 60 * 1_000

@impl GenServer
def handle_call(:get_state, _from, state), 
  do: {:reply, serialize_state(state), state, @timeout}
5 Likes

That will not terminate the GenServer but only suspend it.

To actually get a termination, one needs to actively work with :erlang.send_after/3/4 and :erlang.cancel_timer/2. Storing the last timerref in the GenServers state.

Basically the following has to happen in each and every callback, on each and every clause:

:erlang.cancel_timer(state.tr, []) # we do not want the timeout to be triggered during we do the following calculations.

# the actual work

tr = :erlang.send_after(@timeout, self, :"$shutdown")
state = %{state | tr: tr}
# reply or noreply here

Also a handle_message(:"$shutdown", state), do: ... needs to exist that triggers the actual shutdown.

Well… when I use this syntax, the terminate function is called after @timeout expires.

Is there something I am missing?

1 Like

No, I did. It seems as if I misread the documentation of the timeout value up until today.

But after reading it again today, I think you are right, nearly.

As far as I read the docs (today) there should be a message :timeout be sent to the GenServer which has to be dealt with in a handle_info/2 callback.

Probably your terminate/2 is called because of a FunctionClause error?

This again might cause restarts by supervisors which aren’t intended. So we should examine what really is happening.

Sadly I can not participate, as my erlang/elixir box is currently out of order (and the other one out of physical reach).

3 Likes

No error in the console. And no need for handle_info(:timeout…) as well, at least in the worker I am using.

UPDATE: Sorry, I am using handle_info, as expected.

def handle_info(:timeout, state), do: stop_and_clean(state, {:shutdown, :timeout})

defp stop_and_clean(state, reply), do: ...

Both processes are linked to the same process that started them (hence the name GenServer.start_link/3; i.e. they are all linked) - so the first one takes the second one down when the first one terminates.

Instead of linking you can use Process.monitor/1 (Process.demonitor/2) with GenServer.start/3.

Scratch that: I missed the Process.flag(:trap_exit, true).

What seems to be happening:

  • Stak#1 terminates due to the timeout
  • The shell process isn’t trapping exits so it terminates when it receives the exit signal from Stak#1
  • The termination of the shell process sends an exit signal to Stak#2 initiating its termination.
  • The reason propagated for all these exits is :timeout as it originated from Stak#1

The easiest way to fix it

  def handle_info(:timeout, new_state) do
    IO.inspect(new_state)
    IO.inspect("time out")
    # {:stop, :timeout, new_state}
    {:stop, :normal, new_state}
  end

This way Stak#1's termination isn’t considered non-normal and no exit signal is generated.

Another bit that can be surprising - add

  def handle_info({:EXIT, from_pid, reason}, state) do
    IO.puts("Exit from pid #{inspect(from_pid)} - reason: #{inspect(reason)}")
    {:noreply, state, @timeout}
  end

Now

iex(2)> {:ok, pid} = Stak.start_link(8)
The server Started
%{}
{:ok, #PID<0.110.0>}
iex(3)> Task.start(fn -> Process.exit(pid, :timeout) end)
{:ok, #PID<0.112.0>}
Exit from pid #PID<0.112.0> - reason: :timeout
%{}     
"time out"
%{}     
:normal 
The server terminated
Elixir.Stak.terminate/2 called wit reason: :normal
iex(4)> 

i.e. another process signals an exit.

  • As expected the GenServer traps the exit and we see the exit message in handle_info/2. The timeout is renewed and we don’t actually time out until later.

But when the parent process signals the same exit:

iex(2)> {:ok, pid} = Stak.start_link(8)
The server Started
%{}
{:ok, #PID<0.110.0>}
iex(3)> Process.exit(pid, :timeout)
%{}
true
:timeout
The server terminated
Elixir.Stak.terminate/2 called wit reason: :timeout
iex(4)> 
[error] GenServer #PID<0.110.0> terminating
** (stop) time out
Last message: {:EXIT, #PID<0.103.0>, :timeout}
State: %{}
** (EXIT from #PID<0.103.0>) shell process exited with reason: time out 

i.e. handle_info/2 is bypassed entirely and the GenServer is immediately terminated.

GenServer.terminate/2

terminate/2 is called if …

Keep in mind that exit signals are a distinct communication path from process mail boxes. Trapping exits “convert” an incoming exit signal into a message for the process. But there is some logic in the GenServer behaviour module that treats exit signals from the parent process differently.


http://erlang.org/doc/man/gen_server.html#Module:init-1

If an integer time-out value is provided, a time-out occurs unless a request or a message is received within Timeout milliseconds. A time-out is represented by the atom timeout, which is to be handled by the Module:handle_info/2callback function. The atom infinity can be used to wait indefinitely, this is the default value.

as referenced by

http://erlang.org/doc/man/gen_server.html#Module:handle_call-3
which itself is referenced by
http://erlang.org/doc/man/gen_server.html#Module:handle_cast-2
in regards to

Result = {noreply,NewState} | {noreply,NewState,Timeout}
  | {noreply,NewState,hibernate}
  | {noreply,NewState,{continue,Continue}}
  | {stop,Reason,NewState}

Process.cancel_timer/2
Process.send_after/3

6 Likes

Thank you Very Much!!!

Specifically @peerreynders from giving me this code:
“”"
{:stop, :normal, new_state}
“”"
That solved my problem of all the pids dying when one timed out! This was my most problematic issue.
And thank you for that informative response.

And @kokolegorille for this code:
“”"
@timeout 6 * 60 * 60 * 1_000

def handle_call(:get_state, _from, state),
do: {:reply, serialize_state(state), state, @timeout}
“”"
Thank you very much, your code worked like a charm :slight_smile:
Very happy and very grateful!!!

@NobbZ
I already knew your name well because of how many posts I’ve read that you’ve responded to. So thank you for I’ve learned much already from you prior to this day.

4 Likes