Spawning Task vs enqueuing another Oban worker from within an Oban worker

visezoly · September 14, 2023, 2:10pm

situation:

I have a list of posts, each post is associated with many comments,
I want to process each post in an Oban worker where for each post and each comment separately I want to do some work like for instance calling external API and then saving a result to my db (so if a post has five comments, then one worker should make six api calls etc)

And here is the question: What to do when a post has a lot of comments?
Can spawning a Task for each comment to do this api call in another process be beneficial?
Or any other recommendations how to solve it in ‘an elixir way’?

dimitarvp · September 14, 2023, 2:43pm

Spawning extra tasks, especially when they call external API, carries the real risk of them timing out and taking the Oban job with it. You can increase Oban’s timeout, sure, but I’d personally just queue extra Oban jobs from within the job that works on a post; have it do its job on the post itself and then have it queue e.g. 5 other Oban jobs (if the post has 5 comments).

Though it has to be said that if the information you want to store in the DB is aggregated – i.e. one database row in another table whose contents are determined by the 1 post with 5 comments – then it’s likely better to just use Elixir’s parallelism inside the post’s Oban job indeed.

visezoly · September 14, 2023, 3:33pm

Thank you for clarifying, so just like in this article:

Task alone might be too primitive?
However, even if it crashes, Oban worker can be retried and resumed, if I understand correctly.
That is the whole purpose of using it, or is there some deeper explanation?

Moving on, how do I properly enqueue a few Oban workers from within an Oban worker to manage potential retries properly if there is some error from api call, etc? Any helpful resources or open source examples?

benwilson512 · September 14, 2023, 3:47pm

You can simply insert more oban jobs from within the current job that is running.

It will be, but the whole job will be retried. If you are tracking which API calls you’ve made for which comments then that’s fine, you can skip those. If the API calls are idempotent then you’re also fine. If those things aren’t true though then you probably want a job per API call.

visezoly · September 14, 2023, 3:54pm

Yes, the api calls are idempotent, so in that case, is there any real difference between making all the calls from within one worker vs many workers? At what scale (number of comments?) does it make sense to separate?