Oban Pro Relay and `:infinity` timeouts

AHBruns · October 23, 2023, 11:15pm

So, I’ve been using Oban.Pro, and specifically Oban.Pro.Relay recently to run API requests to 3rd party services. The idea being I can have a queue per third party service, then place a global rate limit on it which matches that service’s rate limit.

My question is, is it safe to use a timeout on my await call of :infinity?

There are times where if I makes an API call, I want my process to wait indefinitely for the API call’s response. However, I don’t want to deal with bugs based on processes hanging forever due to Relay.await missing a reponse. Does Relay.async guarantee that once it returns the job it enqueued will eventually run, and does Relay.await guarantee that it will return when the underlying job is run?

benwilson512 · October 24, 2023, 12:04am

This depends a lot on what the process is doing, but in general you want to avoid infinite awaits and instead rely on an async method of completion notification.

AHBruns · October 24, 2023, 2:37am

To be clear, this is not about the await call itself. I could, for example, use Relay.async from a GenServer and handle the response messages myself similar to how one might interact with a Task.

Choosing to use Relay.await and block on a response vs listening for a response in a non-blocking manner is orthogonal to this question.

The question is if Relay will reliably return a response, or if it is possible for a response to be drop (or infinitely delayed). Any idea @sorentwo ?

lud · October 23, 2023, 11:54pm

Hello @sorentwo , regarding the snooze behaviour, I would like to know if snoozing with zero will put the job at the “end” of the queue (jobs with same queue name and same priority) if all the jobs were enqueued with the default timestamp (no schedule_in option given for any job).

@AHBruns I don’t know Relay but imagine for some reason something odd happens with the network, unrelated to Oban, and makes your API call to never return. Then your job will never finish, and if you want to stop and restart the BEAM it will have to be killed. It’s best not to have to deal with that so a generous timeout should be better, and the timeout for awaiting the job should be the same timeout plus a couple seconds.

I don’t know if it is configurable in Oban but you would have to set the same timeout for children termination in Oban supervisors somewhere.

sorentwo · October 24, 2023, 1:36am

Jobs always run in the order they were scheduled, assuming the same priority. When jobs all have the same timestamp then the id acts as a secondary sort, where earlier jobs run first.

Snoozing with a 0 timeout will reschedule it at the current time. At that point, all the usual ordering applies, e.g. the snoozed job will run after anything inserted before it.

AHBruns · October 24, 2023, 2:47am

My question was moved, so not sure if here is the best location to continue discussing, but I’ve designed my jobs to always return, in the worst case, the API call they are running hit its timeout, and then the job returns.

The problem with just setting a larger timeout on my Relay.await call is 2 fold

since my queue’s are rate limited, I can have API requests “back up” this means that while the job itself may only take a X seconds at most, it can take an arbitrary amount of time to start.
Relay.await calls can nest arbitrarily. E.g. I might Relay.await job A, which is some high level process “e.g. onboard customer”, then in job I might Relay.await 2 jobs, B and C which each execute an API call. The result is that the theoretical max run time for A is now something like max(B) + max(C) + some buffer. This is would quickly become a pain to track, even if issue 1 didn’t already make it impossible.

benwilson512 · October 24, 2023, 2:50am

Hey all sorry for the forum noise, all the relevant posts should be here now. @sorentwo requested that I move these posts since he wanted a chance to dig into the questions here in more depth without cluttering up the general thread.

AHBruns · October 24, 2023, 2:52am

Apologies for the hassle!

sorentwo · October 24, 2023, 3:31am

Relay uses normal queue functionality to insert the job and await execution. If that queue is backed up then it may take a little while before the job executes. Once the job processes the response is broadcast back to the listening process. Assuming the process is still listening, it will receive the reply.

I recommend using a shorter await timeout and re-awaiting a few times so the process doesn’t hang too long.