I would like to know how can I spawn one process that in my system represent a ‘‘order’’ each order has 1000+ images so I would like to process each image in a separate process but I can only say that the ‘’ order has been processed ‘’ if all those 1000+ images were processed successfully. Is there a way to keep tracking of these ‘‘child processes’’ ?
Functions like Task.await_many or Task.yield_many could do what you’re looking for.
HOWEVER
Consider carefully if you actually want to launch 1000 processes all at once, versus using a pool of workers. Unless you’re using a VERY large server, most of those 1000 processes will be waiting for their turn to run most of the time.
I will agree with the previous post though, instead of spawning 1000 processes, a wiser option would be to chunk the images and spawn less processes that process many images.
First of all, thanks for your reply! unfortunately I don’t think I quiet get what you said. Using a pool of workers I would be able to open a “order_process” and process all of those images or just a few of them? Would I be able to spawn order_1, order_2, order_3 and each of them ‘‘wait’’ for their images?
I can’t speak to the part about “order” processes; that’s going to depend on where orders come from and what cares about that “order has been processed” status update.
For a particular order, there are tradeoffs between concurrency and parallel processing overhead. For concreteness let’s assume there’s a list of 1000 images named images and a function process_image that takes an image, does the thing, and returns a result.
The “maximum concurrency” approach would be to start up a new process for every element of images and then wait for all the results:
If process_image does a lot of things that involve waiting for external resources, this may speed things up a lot.
On the other hand, if process_image does a lot of things that need CPU time, things won’t speed up much more than the number of schedulers in the system.
That last situation is common enough that there’s a standard function to handle it better by only starting enough processes to keep the schedulers busy: Task.async_stream Using it would look like:
Both of these approaches will use all the processing resources available when given enough images.
If you’re expecting to handle multiple orders, this is a problem: what happens when many arrive at once? Processes are cheap on the BEAM, but not free.
This is where ideas like “worker pools” like :poolboy or job-queuing systems like Oban are useful; they allow you to define how many workers should run simultaneously and then balance the work against those. Oban Pro’s batching would be a particularly good fit for this requirement.