Good solution to launch asynchronous tasks with rate limiting

Hello,
This is my first question here, please let me know if it can be improved.

Every night, I would like my server to run a validation process on a number of urls. It computes some hash, download some data, validate it, etc.

It looks like tasks are best suited for this. I’ve also read that supervised tasks are better than just creating unsupervised tasks, for the clean-up, in case the main process is terminated. Can you confirm that I’m taking a good direction so far ?

Also, my server is running on a small instance, I’m worried that if I give it a big list of tasks, it will just crash because of the small resources I have available. I was hoping that the Task.Supervisor would have an option about the maximum number of concurrent tasks running, but I didn’t see any. Any information about this ?

I saw async_stream had a concurrency limit option, but didn’t understand if it could be helpful for my situation. Is it ? And is it compatible with a Task.Supervisor ?

Probably many newbie questions here, thanks for your time !

2 Likes

Do not worry. BEAM uses preemptive scheduler (at least from the viewpoint of the user) so it should be ok to spawn new process per each URL (depending on the scale) and just let VM do its magic. Erlang processes are quite cheap, so this should not be a problem if there is list of 100s of URLs, I would start thinking about any throttling only if there would be over 1k of such. In that case you can use GenStage which allows for rate limiting.

3 Likes

I use Task.Supervisor.async_stream_nolink/6 in several places for this sort of thing. The tasks are supervised by the supervisor in the first param, but they are not linked to the caller. In the caller, I then Enum.reduce over the resulting stream and collect a little “report” of number of successes/errors/etc and log the result.

I typically specify the concurrency and unordered settings in the options. This concurrency throttling mostly helps me to not overwhelm some external API that the tasks call. It’s not because I’m worried about the BEAM handling a large number of tasks.

3 Likes