Nice shutdown for a worker pool

alexcastano · June 1, 2023, 3:56pm

I’ve been working on a pool of remote sessions, where each session is created and closed with HTTP requests. Sessions are limited in number, so I figured, why not implement a pool? An additional requirement is that these sessions must be refreshed every 15 minutes if idle.

I decided to use a GenServer to handle the sessions. When it starts, it makes a request to create a session and stores the secret. On termination, it makes another request to close the session. It has a public API that makes an HTTP request using the session, parses the response, and sends it back to the client. And if it’s not used for 15 minutes, it’ll refresh the session itself.

To manage the pool, I’ve got a Supervisor looking after two children:

Finch HTTP pool
Poolboy worker pool

So when a Poolboy worker needs to make a request, it grabs a connection from the Finch HTTP pool. However, things start to go south when it’s time to shut everything down.

When the app needs to shut down, the Supervisor sends a shutdown signal to Poolboy. But it seems Poolboy isn’t that patient and doesn’t wait for its children to stop. From what I can tell, the exit/2 function works asynchronously. So, the children get the shutdown signal, but the Poolboy worker pool doesn’t stick around.

github.com

devinus/poolboy/blob/master/src/poolboy.erl#L285


      
                              {noreply, State}
                      end
              end;
          
          handle_info(_Info, State) ->
              {noreply, State}.
          
          terminate(_Reason, State) ->
              Workers = queue:to_list(State#state.workers),
              ok = lists:foreach(fun (W) -> unlink(W) end, Workers),
              true = exit(State#state.supervisor, shutdown),
              ok.
          
          code_change(_OldVsn, State, _Extra) ->
              {ok, State}.
          
          start_pool(StartFun, PoolArgs, WorkerArgs) ->
              case proplists:get_value(name, PoolArgs) of
                  undefined ->
                      gen_server:StartFun(?MODULE, {PoolArgs, WorkerArgs}, []);
                  Name ->

Then the Finch pool shuts down, the Supervisor follows suit, and finally, the workers. But, here’s the twist: the workers need the Finch pool to send out a final request to close the remote session. So, they crash without closing the remote sessions.

What’s throwing me for a loop is why a library as popular as Poolboy would behave like this. Am I missing something here?

I also thought about using NimblePool, but decided against it because:

NimblePool may not be a good option to manage processes. After all, the goal of NimblePool is to avoid creating processes for resources. If you already have a process, using a process-based pool such as poolboy will provide a better abstraction.

Any ideas on this? Maybe a different approach?

PD: I opened an issue in Poolex

mpope · June 1, 2023, 4:18pm

If you need a solution before a fix is released from the libraries you could use a process that each worker links with called the WorkerManager. This process can add each registerd worker to a set for tracking, and listen for their deaths. Once all of the processes that were linked die, then this process can terminate itsself. You can use a supervisor is has a rest_for_one strategy, then I think if you have the order: [FinchPool, WorkerManager, WorkerPool]. The WorkerPool can shutdown, the manger will wait for all the workers, then terminate once the final worker dies, and then finally the FinchPool can shutdown. This ordering should allow for a worker that is shutting down to grab a Finch connection.

alexcastano · June 1, 2023, 8:09pm

Thank you for your proposal. Last commit in Poolboy repo was more than 4 years ago, so I’m hopeless. I’ll try your solution if I cannot find a good alternative to Poolboy.

I don’t like that the worker has to know about the manager to link, but I’m sure it will eradicate the bug

mpope · June 1, 2023, 8:39pm

Hopefully it works, its a common enough pattern described in Adopting Erlang’s Supervisor section. Might be overkill but you could checkout GitHub - inaka/worker_pool: Erlang worker pool, too

dimitarvp · June 1, 2023, 8:46pm

Poolex is another alternative.

cmo · June 1, 2023, 10:13pm

pooler exists too

alexcastano · July 12, 2023, 2:56pm

Hello, sorry for the delay. I am still working on this but don’t have a final decision.

It is very cool. I tried, but the pools must be created under the :pooler_sup tree. This is very inconvenient in our case. We want to launch the worker and the Finch pool under the same supervisor. This way both pools are started and shut down together, and in the right order.

We tried poolex, and we found some minor issues that have been fixed. The maintainer is very friendly. However, I feel like it is not very used or “production ready”. Maybe, we keep using it and trying to fix all the issues, but it is not the best for project for doing this

We didn’t try worker_pool because it doesn’t have the “overflow workers” feature, which is very useful for our current situation.

Thank you for your proposals

dimitarvp · July 12, 2023, 4:04pm

In that case I think that the Erlang :jobs library could serve you because it also tries to auto-adapt depending on the observed (BEAM) system health.