GenServer slow init task w/ polling: sleep in handle_continue, or use Process.send_after?

Hi,

[Question summary: is it ok to use Process.sleep/1 within a GenServer’s handle_continue callaback during the initialization process?]

I’m starting up a (named) GenServer to process items made available by an external API. The items are then processed under a TaskSupervisor that is NOT started by the GenServer (i.e. they are siblings in the supervision tree).

When the tasks in the TaskSupervisor terminate (whether successfully or not), they will notify the GenServer (even across restarts, thank to this bit of magic: https://github.com/elixir-lang/elixir/blob/v1.8.1/lib/elixir/lib/task/supervisor.ex#L425).

If the GenServer crashes, there could be tasks still being processed by the TaskSupervisor. Naturally, the GenServer won’t have any tracking state related to those tasks. However, as they terminate and send their result to the GenServer, the GenServer will be able to rebuild its state and catch back up. What I’d like to do when initializing the GenServer is to idle if there are ongoing tasks and wait until the TaskSupervisor has no children before starting to accept messages and trigger new work.

My idea was basically to check every 5 seconds on the count of TaskSupervisor.children/1: if it’s 0 finalize initialization, otherwise loop and wait longer. (Are there smarter ways to do this?) Do achieve this polling, I see two approaches:

  1. after init, go into an initializing status. When in this status, all handlers would return {:error, :not_ready}, and the handler would trigger Process.send_after to call itself again in 5 seconds if tasks are still ongoing. If no tasks are ongoing, the status is changed to :active and a message is sent to self() to start triggering work.

  2. use the recently (OTP 21) introduced handle_continue mechanism to achieve a similar goal, but in a cleaner fashion. init would return {:ok, state, {:continue, :initialize}} and in def handle_continue(:initialize, state) the function body would either return {:ok, state} if no tasks are ongoing, or if tasks are still running it would Process.sleep(5_000) and then return {:ok, state, {:continue, :initialize}}.

The advantage of the 2nd approach is that other GenServer callbacks (handle_call, etc.) wouldn’t have to treat the case where the status is :initializing which isn’t relevant to their logic.

GenServers shouldn’t typically make use of Process.sleep as that prevents them from being responsive, but it seems like doing so in handle_continue would be acceptable as being “unresponsive” is kind of the goal?

1 Like

What you describe to me sounds like a much better fit for GenStage / Flow and not for a classic GenServer.

The items from the remote API are indeed processed using Flow (with a producer fetching batches of items), one pipeline instance per remote job that needs to be processed.

I’m using the GenServer to track the state of which remote jobs have been successfully processed (all items fetched and persisted locally), tagging them as processed on the remote API, managing failed jobs, etc.

Basically what I’m wondering about is: if a GenServer has to poll something for completion before being able to finalize its own initialization, is it ok to sleep in handle_continue or will that cause unforeseen problems?

1 Like

It’s generally better to avoid Process.sleep and instead use Process.send_after as while sleeping the process won’t be able to respond to system messages or debug messages.

Yes, but as far as I can tell the benefit of using handle_continue is that no other messages from the inbox will be processed until the continuation is done. In other words, as soon as you “exit” the handle_continue processing, other messages can be processed by the GenServer and need to be handled (as described in option 1.).

Using send_after means the message can only be handled in a handle_info callback implementation (i.e. option 1. in the above). There’s no way to delay message sending while remaining in a handle_continue case as far as I can tell. And this seems to mean that using send_after means that the “not yet fully initialized” state needs to be handled in all callback implementations, whereas using the handle_continue case would avoid that noise/boilerplate.

Does that mean that using handle_continue is not advised when the GenServer initialization requires polling before initialization can be completed?

If you sleep to prevent business messages from being handled you will also prevent system and debug messages from being handled.

The way I would handle this would be to have work producers not send any work to the gen_server until it signals that it is ready for work. GenStage provides some functionality that makes this fairly straight-forward.

1 Like