How to Implement Dynamic Process Pooling in elixir with custom submission logic?

dubeypranav4 · July 16, 2024, 12:31pm

Hello Elixir Community,

I am working on an Elixir application where I need to introduce a process pooling mechanism (thinking of using :poolboy). The goal is to create a pool of worker processes that can dynamically handle different types of payloads( such as company IDs or other data). Here’s a brief overview of my current setup and the changes I plan to make:

Current Setup:

Application Module: The application has several GenServers and one supervisor with all the genservers as direct childrens with a :one_for_one strategy.
GenServer Process: Each GenServer has a self-calling mechanism to process a list of companies periodically with the help of delay and what we call :tick as a message. These genservers do thing like fetching data for these companies, tracking internal progress of what work has been done, state of work being done, submitting work and so on.

The way it currently works is that there is a process CompanyDataManager who has the responsibility of maintaining a list of companies which it periodically refreshes from an API. On arrival of a new company or change in it’s metadata, it does chores like creating a new schema, creating new required entities in existing table and so on, post which it updates it’s local ets and the company (along with it’s metadata) is then ready to be processed.

Other gen server processes right now call a function in CompanyDataManager::process_enabled_companies which basically takes a function and args as parameters and runs the function for all the companies.

New setup:

The new setup plans to introduce a pooling in between so that any genserver process right now can be retrofitted to run in a pooled fashion based on the defined config.

The payload submission logic should be overridable wherein one process might have a different way of generating/submitting payload and handling it.

The default payload submission logic would be same as now which would be an infinite stream or a process variable based circular list kind of setup to iterate over list of companies and submit it to the worker.

Concern/Questions

For most of the part I understand that i can individually convert each process into a supervisor with two childrens: 1) ProcessManager- which will manage the logic of submission to worker, 2) DynamicWorkerSupervisor: which will manage the workers and it’s lifecycles.

What i want to do is have a generic enough design so that i can later on move any other process from the :tick based mechanism to the pool based mech?
Float the errors up and have custom error handling logic, similar to how genservers errors are treated by Supervisors (fail if X no. of errors are observed in Y time).
Recommendation on a better design or alternative.

benwilson512 · July 17, 2024, 12:00am

Hey @dubeypranav4 welcome. The first rule of pooling in Elixir is that pooled processes are only useful if they actually have a purpose to existing while idle. Something like holding onto a resource (like a connection) or some piece of data. If your processes are just sitting around waiting to be assigned then I think you should just spawn a new process whenever you want one. There is no benefit to having them around already.

If you’re trying to limit how many of these are alive at a time you can control that from your Process manager.

The only final thing I’ll say is that while it’s a bit hard to tell from your description, make sure to avoid Process-related anti-patterns — Elixir v1.18.0-dev. Make sure to separate the runtime activity that actually needs to be performed from the structure of code or even data that the activity is being performed on.

dimitarvp · July 17, 2024, 12:29am

Latency / speed. I experimented with task pools a year or so ago and found that if the task pool keeps all code-executing processes around then there’s a slight speed advantage (or not so slight; in some cases it was as high as 3ms better). It did help for my tests with trading API scanners.

…Come to think of it, I should really polish these projects and publish them, they’d make a good portfolio / bragging rights pieces.

benwilson512 · July 17, 2024, 12:34am

Very interesting, I’d be curious to see some examples. From what I had seen for short lived operations the GC efficiency of just exiting the process was a net win.

dimitarvp · July 17, 2024, 12:38am

I’ll finally have some free time this year and promise to assemble some for you and whoever is interested. Remind me if I forget, it’s an area of interest to me – but my curse is that I operate with limited energy and never finish those things once I get the information that I wanted (which is always at the 80% - 90% mark).

dimitarvp · July 17, 2024, 12:39am

That can very easily be true, my tests weren’t super deep. Thanks for putting this on my radar, I’ll think on how I can include a realistic workload.

dubeypranav4 · July 17, 2024, 7:27am

Yes please.

dubeypranav4 · July 17, 2024, 7:34am

Thanks for the response. In my case, the processes would ideally not be sitting idle as there would be infinite stream of payloads (in my case company ids which i’ll just keep looping over) for them to work. The alternatives I could think of was:

spawning a new process for each incoming payload, or else
keep the process alive and keep on passing payloads to it.

As for the anti patterns, I’ll give a thorough read to the doc attached.
I’ll also try to update the topic here with what implementation i end up with.