Calling receive/1 within a GenServer callback: genius or idiocy?

mtrudel · May 2, 2023, 11:47pm

Before we get started: yes, I’m well aware that the docs say not to do this. Yes, I’m aware that this is counter to the general shape of the GenServer API. I’m not interested in any of that. What I’m interested in is whether there are any footguns to this idea, in full awareness that it’s probably a lot dumber than it is clever. I’m trying to solve a very specific problem here with a very specific solution and I’m looking for input.

In a nutshell, what I want to do is to be able to call receive/1 within a GenServer.

My reasons for this are primarily to try and replicate within a GenServer the general shape of :gen_tcp’s active: :once pattern, whereby an incoming TCP packet can either be explciticly waited on via :gen_tcp.recv/2, or optionally sent to the process as an Erlang message.

I’ve read through the gen_server.erl and gen.erl source code, and the only ‘special’ thing I can see with messages handling within GenServers are that calls/cast messages have a specific format to them ({:"$gen_call", {pid, ref}, msg} for calls, and {:"$gen_cast", msg} for casts). My hunch here is that if I’m careful about filtering these messages in my receive/1 call (and if I set aside any expectation of temporal ordering between inline received messages and call/cast messages), I should be OK.

Supposing the following proof of concept:

defmodule Test do
  use GenServer

  def start_link(arg), do: GenServer.start_link(__MODULE__, arg, name: __MODULE__)

  def init(arg), do: {:ok, arg}

  def handle_call(msg, _from, state) do
    IO.puts("handle_call #{msg}")
    {:reply, :ok, state}
  end

  def handle_cast(msg, state) do
    IO.puts("handle_cast #{msg}")
    {:noreply, state}
  end

  def handle_info(:wait, state) do
    received = safe_receive(10_000)
    IO.puts("Waited for #{received}")
    {:noreply, state}
  end

  def handle_info(msg, state) do
    IO.puts("handle_info #{msg}")
    {:noreply, state}
  end

  defguardp is_gen_call_msg(msg)
            when is_tuple(msg) and tuple_size(msg) == 3 and elem(msg, 0) == :"$gen_call"

  defguardp is_gen_cast_msg(msg)
            when is_tuple(msg) and tuple_size(msg) == 2 and elem(msg, 0) == :"$gen_cast"

  defguardp is_gen_server_msg(msg) when is_gen_call_msg(msg) or is_gen_cast_msg(msg)

  def safe_receive(timeout \\ :infinity) do
    receive do
      msg when not is_gen_server_msg(msg) -> msg
    after
      timeout -> nil
    end
  end
end

I get the following result:

iex(1)> Test.start_link :ok
{:ok, #PID<0.163.0>}
iex(2)> GenServer.cast(Test, "abc")
handle_cast abc
:ok
iex(3)> send(Test, "abc")
handle_info abc
"abc"
iex(4)> send(Test, :wait)
:wait
iex(5)> send(Test, "abc") # Within 10 seconds
Waited for abc
"abc"
iex(6)> send(Test, "abc")
handle_info abc
"abc"
iex(7)> send(Test, :wait)
:wait
iex(8)> GenServer.cast(Test, "abc") # Within 10 seconds
:ok
Waited for
handle_cast abc

Beyond assuming things about the shape of internal GenServer messages, I can’t see where I’m seriously tempting fate here. Can anyone see any obvious footguns with this? I’m hoping to wrap this general approach up into a library that provides :gen_tcp-like sync/async recv semantics, but backed by Erlang messages instead of a socket (this is all to facilitate GenServer based processes for HTTP/2 streams in Bandit, to better allow for HTTP/2 WebSocket upgrade handling).

al2o3cr · May 3, 2023, 12:45am

IMO the weird vibe here isn’t from calling receive inside a GenServer - after all, that’s what happens when code in a handle_* GenServer.calls another process - but from the “receive ANYTHING but these messages” idiom.

I doubt most callers of safe_receive would be expecting to get a {:system, pid(), term()} message back, but they could since that’s not one of the blocked shapes.

What about making safe_receive a macro, so that callers could be explicit about what they’re waiting for? Something like:

safe_receive({:tcp, ^socket, _}, 10_000)

which would expand to an AST like:

    receive do
      {:tcp, ^socket, _} = msg -> msg
    after
      timeout -> nil
    end

mtrudel · May 3, 2023, 12:57am

That’s a good point about handle_call doing some receiving internally (I hadn’t realized the extent of it until I’d started digging around recently!).

Also a good point about safe_receive/1 assuming too much about the messages you’d want to receive. It probably makes more sense to expose the guards for use directly, and document the hell out of the fact that you want to make ABSOLUTELY SURE you’re not matching your receive/1 on anything that could match is_gen_server_msg/1. I’m still playing around with the general shape of the ‘generic’ part of this approach vs. the parts that are specific to my needs.

Thanks for the feedback!

benwilson512 · May 3, 2023, 3:03am

I would echo @al2o3cr’s remarks, and it also makes me wonder if you want to perhaps explore creating a custom gen behavior module? If you want to do your own message loop but also want the perks of the proc lib stuff?

tj0 · May 3, 2023, 3:40am

Perhaps I’m conservative, but whenever I’ve had to write a loop with receive, I’ve done it outside the genserver paradigm.

genserver.erl


loop(Parent, Name, State, Mod, {continue, Continue} = Msg, HibernateAfterTimeout, Debug) ->
    Reply = try_dispatch(Mod, handle_continue, Continue, State),
    case Debug of
        [] ->
            handle_common_reply(Reply, Parent, Name, undefined, Msg, Mod,
                                HibernateAfterTimeout, State);
        _ ->
            Debug1 = sys:handle_debug(Debug, fun print_event/3, Name, Msg),
            handle_common_reply(Reply, Parent, Name, undefined, Msg, Mod,
                                HibernateAfterTimeout, State, Debug1)
    end;

loop(Parent, Name, State, Mod, hibernate, HibernateAfterTimeout, Debug) ->
    proc_lib:hibernate(?MODULE,wake_hib,[Parent, Name, State, Mod, HibernateAfterTimeout, Debug]);

loop(Parent, Name, State, Mod, infinity, HibernateAfterTimeout, Debug) ->
        receive
                Msg ->
                        decode_msg(Msg, Parent, Name, State, Mod, infinity, HibernateAfterTimeout, Debug, false)
        after HibernateAfterTimeout ->
                loop(Parent, Name, State, Mod, hibernate, HibernateAfterTimeout, Debug)
        end;

loop(Parent, Name, State, Mod, Time, HibernateAfterTimeout, Debug) ->
    Msg = receive
              Input ->
                    Input
          after Time ->
                  timeout
          end,
    decode_msg(Msg, Parent, Name, State, Mod, Time, HibernateAfterTimeout, Debug, false).

Doing a receive within loop just feels really weird as the main loop is waiting on the mailbox first. It’s been a while, but I think you can do a selective match on receive itself which could be safer. I think the key to this working is making sure that you’re matching on different messages to make sure you don’t end up with some really weird lock state.

Considering that you’re doing that, with handle_info(:wait) to safe_receive, it could technically work and just makes me feel very uncomfortable.

mtrudel · May 3, 2023, 1:41pm

Yeah, that’s roughly what I was thinking (I’m not totally sure yet, but I’ll likely layer it like so):

A generic behaviour implemented straight on top of GenServer, just adding a couple of functions / macros to safely listen within GenServer callbacks. This layer is just building blocks, and a bunch of documentation describing that this is 1. pretty unconventional, and 2. a little dangerous if not used carefully.
A message-specific implementation on top of this, that provides something like :gen_tcp’s active: :once pattern (I may actually shim this into Thousand Island’s Handler behaviour if it fits well.
Implement Bandit’s new ‘GenServer-style’ HTTP/2 stream handler on top of this

I’m hoping to isolate most of the oddball stuff in item #1 above.

derek-zhou · May 3, 2023, 2:29pm

Temporarily narrowing the scope of a GenServer by calling receive do is ok; I;ve seen such pattern in high quality code and I have used it myself once or twice. The key here is temporarily. You don’t want your clients to timeout in call and you don’t want to create a large backlog of messages.

rvirding · May 7, 2023, 11:14am

Wouldn’t the safest way be to e very explicit with which which messages you would like to receive and handle here instead of saying anything but the “system” messages? There must be a reason why you want to receive messages here, and which ones.

mtrudel · May 7, 2023, 2:30pm

Indeed! I did myself a disservice in my original question by being far too loose in my example receive; in point of fact I plan to be waiting only on a very specific set of messages. My original question was much more about ‘is it safe / idiomatic to receive on messages within a GenServer callback even if done with care’, and not so much about how wide of a receive net one could cast in that regard.

@al2o3cr and others’ replies made the answer clear (and indeed obvious in hindsight): of course this is a blessed pattern, given that it’s exactly how the caller side of GenServer.call/3 works.

Thanks all for your replies!

hst337 · May 7, 2023, 2:33pm

The last time I needed such behaviour, I ended up writing my own otp-compatible process. I’ve tried to create a discussion about extending Gen primitives with more general primitive allowing handling more interactions instead of just call/cast/info here

It used to be possible to overcome these limitations with libraries like Core (which is deprecated). However, any third-party attempt will still be unable to follow up with OTP extending their GenServer protocols in the future

But coming back to this idea, it is not possible to have such safe_receive macro, even with hacky-hacks like prim_eval:receive which allow user to peek into top of mailbox without actually taking message from it.

I’d stick with safe_receive(pattern, timeout) too

hst337 · May 7, 2023, 2:41pm

Well, it is safe if you don’t care about debugging with sys and other meta information which is collected upon every received message by gen_server runtime. And it is safe if you don’t care about other developers reading this, haha). And there can also be the case where GenServer must always react to some messages as soon as it receives them (for example, logout_user when security matters).

So, the drawbacks are

Hard to understand
Harder to debug
Not suitable when server processes a ton of different messages with different priorities

But technically, if I wanted to receive a specific message and postpone the others, I would use gen_statem or (GenStateMachine elixir wrapper) which allows developer to postpone the messages until specific state is changed

mtrudel · May 7, 2023, 2:59pm

Good point about the sys hooks. I would be able to replicate them pretty effectively I think (possibly with a different set of debug flags though).

In terms of legibility, I’m planning on the core of this approach being a very small, focused module (maybe a library) that adds just the one thing. It should be pretty straightforward to keep this grokkable, I hope.

The issue about long running callbacks causing message receipt to be delayed isn’t unique to this case; it’s a concern for sure but not something particular to this model.

hst337 · May 7, 2023, 3:24pm

You’re right, but only partially. Imagine something like

maybe_logout_user(user_session_server)
... # Some code here
post_picture(user_session_server, picture)

Here developer will be thinking that the logout message sent by maybe_logout_user function will be processed before post_picture message of the server. Your solution with safe_receive breaks order of messages, therefore it suddenly becomes very tricky to understand

And while there are more explicit and adopted primitives like gen_statem which achieve exactly the same results without implicit message processing reordering, I just don’t see how this safe_receive is useful.

hst337 · May 7, 2023, 3:28pm

Plus, sys hooks collect state. How do you want to collect this state with safe_receive? If it would get stored in pdict, developer would require to clean this pdict manually