Updating state of GenServer with (long running) process

Hi all,

I’m working on a monitoring application which should collect the state of file servers used in our company. Each state of a file server is represented by a GenServer which collects the server information every minute. On init of every GenServer I start a :timer.send_interval to send a message every minute to self to trigger the collection of the current server state. The problem here is, that it’s unpredictable how long this collection process will run. It’s not uncommon that the process runs longer than 20 seconds, which blocks the GenServer.

In order to keep the GenServer responding I think collecting the server information and update the state of the GenServer asynchronously is the way to go. After reading the Task documentation I found this section:

It is not recommended to await a long-running task inside an OTP behaviour such as GenServer. Instead, you should match on the message coming from a task inside your handle_info callback.

But what does this mean exactly? Should I simply start the Task with start/1 and pass the pid of the calling GenServer as argument to the Task and later send a message with the collected server information back to the GenServer? How would your solve the problem of updating the state of a GenServer asynchronously?

Many thanks.

2 Likes

I’m not very knowledgeable in OTP (I’m currently learning it through the Little Elixit and OTP Guidebook and I’ll follow up with the Elixir in Action book). However, I believe this should be done like this:

  1. Have the GenServer collect the state at n interval seconds.

  2. When the timer is up and the state must be collected spawn a new process that will run in parallel to the to the GenServer process. This collector process must know what is the GenServer PID.

  3. Once the collector process has finished its operation have it send a message to the GenServer process using its PID and with a message like {:state_update, state}.

  4. GenServer receives the message (you can use handle_info for this) and update the GenServer’s state.

I think this should do it. From what I understand the GenServer is just receiving the state from an external server so there won’t be any kind of race condition about the state.

2 Likes

As sashaafm said, this is the way to go. However, the other process should be another GenServer. You most likely want this worker process to be under supervision. Another thing that you can do that helps handle the cognitive overload of using PIDs to pass stuff around is to use named processes.
Example:

children = [
      worker(CronWorker,      [[], :cron]),
      worker(JobRunner,      [[], :job]),  #your long running job
]

You can then use the names (:cron and :job) instead of PIDs AND you have your JobRunner supervised so that if it crashes you have some sort of separation.

3 Likes

Let us assume that this is an application built with OTP and a supervision tree. Let us assume that there is a requirement that the processes get restarted if they die according to erlang supervision principles. So how do we put a long running process into this framework?

I know of two approaches:

  1. Design the process to send messages to itself so that the genserver process remains responsive.
  2. Spawn a separate process that intreacts with a genserver process and is linked into the supervision tree via that process.

Your long running process should checkpoint itself so a restart does not start from the beginning. I am building a backup application that works this way. The checkpoint saves to a couple of files and the checkpoint files are deleted on completion. When the backup process starts, it looks for the checkpoint files and loads those as current state. Acutally that is a slight simplification, state is also held in a directory walker gen_server, and checkpointing involves saving the state of that also.

1 Like