Design problem: I "want" to do a GenServer.call from within the GenServer

axelson · July 1, 2020, 8:06pm

I’m currently working on a GenServer that uses erlexec to talk to an external binary over stdin/stdout. So I already have a handle_call that associates a “command” with the output of the program (which involves a timeout to know when the “command” is done running). The problem is that within that handle_call I want to execute another “command”.

The “easiest” way to accomplish that would be by using handle_call. But of course that will not work because a GenServer cannot call itself since that would result in a deadlock. Another possibility that might work is to spawn a Task, and that task can run the GenServer.call, that way the deadlock can be avoided.

But my question is this: By even posing this problem is it indicating a fault in my design? Should there be an easier way from within the GenServer process to share the code path that sends the “command” to the erlexec process along with the debounced timeout behavior? Should I introduce another GenServer and have that GenServer be responsible for the “higher-level” protocol (which is why I currently want to execute a “command” during a handle_call). Curious to know any thoughts.

sribe · July 1, 2020, 8:40pm

It may well indicate that your design is not well formed.

Do you really need to make this a genserver call? Why not just call a private function within the genserver directly? It may be that the design confusion is simply that you’re already in an async operation and there is no need to make an async call from within it.

But assuming that this second call is time consuming and you need this genserver to be done and ready to handle something else, that wouldn’t work. So, do you actually need the result of the call? For what purpose–just to return it or to do further processing? If it’s just to return it, then pass the from arg through to another genserver and return :noreply, letting the second genserver “return” the value via reply

lucaong · July 1, 2020, 8:48pm

I have had this kind of problem a few times. In most cases, upon reflecting more, it was a symptom of some logic that needed to be extracted. Sometimes I ended up extracting the shared logic into a private function used by several handle_call clauses, and some other times I even ended up extracting it in its own module. The second option can also result in easier testing, and can be called from a Task (or other construct) if necessary, separating functional logic from runtime concerns.

axelson · July 1, 2020, 9:41pm

The difficulty that I’m facing here is that the functionality that I’m writing can not just be a simple private function because it needs to be async (explained further below). Since it has to be async, that means that I would need to keep track of which async operation is currently waiting for output from the erlexec program, which would make the logic difficult to follow and probably error prone.

At the GenServer level all the “commands” sent to the external erlexec process have to be async because the GenServer needs to wait for the handle_info({:stdout, _, _}, state) clause to be invoked by erlexec.

I agree that extracting this logic would be useful, I’m just not sure how to go about it without creating a new process that is built on top of my current GenServer (which seems like it would unnecessarily complicate the design).

Here’s two scenarios:

The “Easier Flow” is relatively easy since it allows building on top of the “run async command” logic of the GenServer, however the downside is that the GenServer state is not available when running the “sub-command”.

What I would really like to do is implement the “Ideal Flow”. The hard part to me is that inside a GenServer handle_call callback I want to run the “run this command” logic synchronously even though the logic actually needs to be async. Also I should make a quick note that the existing “run this command” handle_call callback stores the from in the GenServer state and returns with {:noreply, state} and then later uses the from with GenServer.reply once all the results have been received. When calling the “run this command” logic from the Caller I get the syncrhonicity for “free” because the call is wrapped in a GenServer.call, but within a GenServer I cannot use that same mechanism to make the logic appear synchronous.

Another way of stating this is that within the GenServer I want to have a run_erlexec_command function that will synchronously call erlexec and wait for the result (potentially waiting for multiple results by using a short debounced timeout) and then return. I could introduce another GenServer to get this, but as mentioned in the Original Post that feels like it would make the system more complex which I am hoping to avoid.

ityonemo · July 1, 2020, 10:04pm

is there a reason why the middle thing isn’t just Task?

sribe · July 1, 2020, 10:05pm

OK, you’re using :noreply and return already to suspend the caller until you have results. That’s probably key.

Why does the call to erlexec need to be synchronous? Why not in handle_info determine when data is complete and should be returned? (Including, possibly, multiple passes of sending to erlexec again? Essentially, a state machine which issues subcommands, receives results asynchronously until done.

axelson · July 1, 2020, 10:10pm

The middle process needs to be long-lived because the erlexec process is long-lived and sends output asynchronously to the middle process. And generally you wouldn’t want to have a long-lived Task process.

ityonemo · July 1, 2020, 10:17pm

hm. Are you recycling the erlexec “conn”, and trying to use the genserver as a “lock” to prevent multiprocess contention on the program? Or does your erlexec’d program support multiplexing?

axelson · July 1, 2020, 10:18pm

The call to erlexec doesn’t need to be syncrhonous. I mainly want it to appear synchronous to make the logic flow easier to follow. The handle_info is already determining when the data is complete and then calling GenServer.reply if from (in the GenServer state) is non-nil. So the handle_info could definitely be made smarter to determine if the process that is waiting for the response was an external Caller or if it was the GenServer itself. But if there’s multiple places that the GenServer is running “sub-commands” then it might be necessary to have code for each of those potential “sub-commands” inside the handle_info and I’m worried that the code to manage that will start to become even more complex. A state machine might be able to help manage the complexity but I’m not quite sure how I’d apply a state machine to this problem.

axelson · July 1, 2020, 10:22pm

I’m not sure what you mean by that. What do you mean be “conn” in this case?

I am using the GenServer as a lock/synchronization point (although there is only one process that currently talks to it directly). Although there are actually multiple independent instances of the erlexec’d program (and the associated GenServer) running concurrently, but they don’t communicate or interfere with each other in any way.

ityonemo · July 1, 2020, 11:03pm

I know this isn’t exactly erlexec, but for example, I have long-running outbound SSH connections to machines, and I can pass the ssh connection between processes that need to use it (actually in my case two gen_statems hold onto the ssh connection pid, and in my case both of these statems pass the connection to Tasks to achieve their goals).

So I guess, my suggested model is, you have a caller, you could presumably have the caller “check out” an erlexec conn, and pass that conn to Tasks that perform the command/sub-command jobs as necessary. I would say that Task + FSM data structure is appropriate if the job you are doing has fixed scope and transient lifetime; gen_statem/gen_server is appropriate if the lifetime of the FSM is undefined, or if the FSM needs to update its state by being preemptively “pushed” from the outside-of-beam system.

al2o3cr · July 2, 2020, 12:15am

You could have the sequencing of the command and the subcommand happen in the calling process:

def SequencedOperations do
  use GenServer

  def run_simple_command(...etc...) do
    GenServer.call(...)
  end

  def run_complicated_command(...) do
    result1 = run_simple_command(...)

    run_other_simple_command(..., result1)
  end

  # handle_call etc
end

The client still sees a simple blocking API - SequencedOperations.run_complicated_command() and the GenServer is solely responsible for holding the state of the connection to the external binary.

Something to think about regarding doing this asynchronously: what happens if another call comes in when a command is currently running? Does it go into a queue somehow? What’s the observable state of the GenServer “mid-command”?

At work we tried a similar asynchronous pattern in a complex state machine, and it was a mess - suddenly the async reply could arrive after the machine changed state, and the possible combinations got really unwieldy. In your case, consider simplifying the GenServer to selectively receive just the {:stdout, _, _} message:

def handle_call({:do_thing, arg1}, from, state) do
  #  send command
  #  ...
  receive do
    {:stdout, _, reply_value} ->
      # do something with reply_value and reply to from
    {:DOWN, _, _, _, reason} ->
      # uh-oh the process went away
      # could also let this sit in the mailbox and have a top-level handler for it, with different timeout behavior
  after
    10000 ->
      # timed out!
  end

To answer the previous questions, this approach:

relies on the process mailbox to store calls that arrive while one is already in-flight
the state is only observable when a command is not in-flight

akoutmos · July 2, 2020, 1:51am

I have always stayed away from doing things like that personally as it may have unintended consequences…that and it feel dirty https://hexdocs.pm/elixir/GenServer.html#module-receiving-regular-messages

I’m not sure I follow 100% what you are trying to accomplish…so my suggestion may be way off the mark. But here goes.

Based off of your “ideal flow” chart, would it work if the caller makes a cast call to your GenServer along with the pid of itself and then immediately afterwards has a receive block waiting on a message back. In the mean time, the GenServer can perform a number of non-blocking operations in the background and once enough information has been aggregated, it can send/2 back to the original caller’s pid (which needs to be stored in the GenServer state from the initial cast call).

sasajuric · July 2, 2020, 9:37am

Personally I’d have GenServer message protocol map 1:1 to the message protocol of the external program. In other words, if external program supports commands foo and bar, I’d only have handle_call for these two operations. If I wanted to do both, I’d call foo and then bar from the client process. This composition could still be wrapped in the interface function of the GenServer.

If the operation needs to be performed atomically, i.e. we don’t want any other client to invoke something untill both foo and bar have finished, my first option would be to support the composite foo_bar command in the external program. If that’s not possible (e.g. if I don’t have the control of the program code), then I’d solve this in the GenServer code by keeping track of the remaining commands which need to be invoked before returning the response with GenServer.reply.