Hi,
I’m writing some sort of data-processing in a project, and I suspect I’m headed in the wrong direction with my design: I’m hoping the community can point me in the right direction and right my design wrongs before it’s too late
Essentially, I have a group of elements, and I want to process them in batches. For the sake of an example, let’s say I have lots of companies, with each company having a collection of employee records. The processing for each company is triggered (by an Oban job, but that shouldn’t be relevant) and all employee records for that company should be processed.
My idea is to process the records in batches with a fixed size. The reason for this is that the number of employees per company can vary “significantly”: some may have only a few hundred, while others will have tens of thousands. By having loading the employees in batches, the resource consumption should be more linear, and I should be able to increase concurrency by increasing the number of companies processed simultaneously (please correct me if my reasoning is wrong). Without a batching approach, increasing concurrency in this manner would be hit or miss: loading a company with many employees could suddenly require too many resources.
To achieve this, I was planning on having a GenServer manage the processing of all employee records for a given company: it would be started with a company id, and then start its own Task.Supervisor to farm out the processing of each employee record via async_nolink/3
. As the tasks can fail (e.g. due to DB-related reasons), the GenServer would match on messages from the Task.Supervisor (as suggested here) and keep track of the employee records that were processed successfully or failed. The failed ones would be retried (if the error was transient), or logged and dropped. Then, the next batch of employee records would be fetched, and go through the logic above. Once all employee records for the company had been processed, the GenServer would stop: the calling process would be waiting for the GenServer’s result with a receive
block and respond appropriately (mark job as success/failure).
What makes me suspect that this isn’t the proper approach is that Task.Supervisor cannot be stopped (i.e. it has no stop
function documented, as opposed to e.g. GenServer). Presumably, this is because instances of Task.Supervisor
are meant to be started in the supervision tree and just keep going as long as the OTP app in alive (like other “utility” processes like Registry
).
How then should I rethink my approach? There appears to be no equivalent to Task.Supervisor
's async_nolink/3
within the Task
module. Should I have my GenServer trap exits and use Task.async/1
to reproduce the above approach (i.e. matching task results in the GenServer’s handle_info
)? Should I spawn_monitor/1
? Given I plan to have the GenServer handle results in its handle_info callback, would there be any advantage to use a Task at all over spawn_monitor
? Is my approach entirely wrong?
I guess Task.Supervisor
is the wrong tool for what I’m trying to accomplish: although I want failed tasks to be restarted, I need more control than Task.Supervisor
afford me… So let me know how best to approach this: I’m currently leaning towards a spawn_monitor
approach, but I’d appreciate any insight you may offer.