How to rate limit with Oban?

AHBruns · June 13, 2024, 5:39pm

We are using Oban rate limiting to ensure we don’t overload an external service. The service has a rate limit of no more than 100 requests in any given 10 second window. We make all requests in Oban jobs, and rate limit the queue like so:

rate_limit: [
        # reserve 10% of our allotted requests for other uses (e.g. manual API calls)
        allowed: 90,
        period: 10,
        partition: [
          fields: [:args],
          keys: [:profile]
        ]
      ]

Somehow we’re still hitting the API’s rate limit. We believe the issue lies in Oban’s rate limiting semantics. Basically, we’re wondering if Oban limits how many jobs can be running in a given window, or if it limits how many jobs can be started in a given window.

E.g. if a queue has a rate limit of 10 jobs with a window of 10 seconds, and 10 jobs are running but they were started an hour ago then when a new job is enqueued will it be started right away or will it wait for one of the currently running jobs to stop?

NobbZ · June 13, 2024, 6:16pm

As far as I understand Partitioned Rate-Limiting, the limit is per partition.

AHBruns · June 13, 2024, 6:19pm

Yes, I realize it’s per partition (we partition on the :profile arg and the API we’re calling is rate limited per profile), but what I’m asking is what it’s actually counting. Is it counting the number of jobs started in a given window, or the number of jobs which ran in a given window (even if they were started outside that window).

sorentwo · June 13, 2024, 7:16pm

It is counting jobs started during that window. Limiting concurrent execution requires the global limit instead.

It sounds like there is a mismatch between the number of allowed jobs, the rate limit, and the time it takes to execute a job.

dimitarvp · June 13, 2024, 8:33pm

Related question: if we strictly need f.ex. maximum 100 tasks executed per second (again, 3rd party API rate limit), is this firmly in the Oban Pro territory or can it be achieved with the free engine as well? So far I’ve just opted for specifying 100 jobs for my queue which in my mind means only 100 ever would be picked from the queue (even if it has e.g. 2000 enqueued) and executed concurrently / in parallel – but I might have misread the docs.

Would you please shed some light on this?

AHBruns · June 13, 2024, 8:44pm

This is good to know, though it does bring up the question of how to rate limit the number of jobs that ran in a given window rather than the number of jobs that were started in a given window.

My solution, was to use a global limit of 20 + a rate limit of 70 which should ensure no more than 90 job run in any given window (at most, 20 are started before and 70 started during the window = 90). This has two major downsides.

The global limit is, well, global, whereas my rate limit is per partition. This means, if I have, say, 40 partitions, and each are getting 1 job per second, and my jobs take 1 second to run, on average, I’m going to fall behind despite none of them being anywhere near their partition’s rate limit! I would at least like to be able to have a per partition concurrency limit.
Even if I could do a per partition concurrency limit, I still have to accept lower max throughput because, in order to ensure I never have more than X jobs run in a given window, I have to set my rate limit to X - my concurrency limit to account for the possibility of jobs that were started, but did not finish, before the window started.

Is there any plans to introduce window based concurrency limiting (I won’t call it rate limiting since that has a different meaning)? How would you suggest achieving the goal of only ever having X jobs run in a given window per partition without reducing max throughput?

My ideal (and how I thought rate limiting worked) was that I could configure a queue to limit concurrency over a window to a given number for a given partition. E.g.

concurrency_rate_limit: [
  # only 90 different jobs will run during any given window, this includes jobs that started, but did not finish, before the window started
  allowed: 90
  period: 10,
  partition: [
    fields: [:args],
    keys: [:profile]
  ]
]

sorentwo · June 13, 2024, 9:00pm

Global limits do apply per-partition, but the issue is with combining the rate limit and global partitions. It’s prevented in config now because while technically possible, the combination melts my brain.

Have you considered bumping the period to match the amount of time jobs take to run? Meaning, instead of 90 per 10 seconds where jobs take 20 seconds to run, make it 45 every 20 seconds to approximate the runtime.

sorentwo · June 13, 2024, 9:04pm

Anything that Pro does can be achieved using the OSS version, but you’ll need to do the heavy lifting yourself and use a separate rate limiter. That means you rate limit in your application somehow and snooze jobs that are over the rate limit. It’s then extra effort to make that global, plus it causes churn while jobs transition between states (executing → scheduled → available) in a loop.

That’s exactly how it works, for a single node. Each node running the queue could run 100 jobs in parallel. If you do a rolling deploy, or run multiple nodes, you’ll exceed that limit.

dimitarvp · June 13, 2024, 9:06pm

Thank you. For now I am on a single node so this is good enough. Appreciate you taking the time to make it clear.

AHBruns · June 13, 2024, 9:49pm

It’s prevented in config now because while technically possible, the combination melts my brain.

Fair enough.

Have you considered bumping the period to match the amount of time jobs take to run? Meaning, instead of 90 per 10 seconds where jobs take 20 seconds to run, make it 45 every 20 seconds to approximate the runtime.

The issue is that my jobs are lumpy in the amount of time they take. Basically, we use Finch to make our HTTP calls. Sometimes the TCP connection Finch uses for the HTTP request is closed. When this happens the call fails. We use Tesla retries when this happens to ensure the request goes through.

What this means is that 99% of the time our request will fire basically instantly after the job starts, but sometimes it will have to wait for 1 or even 2 retries to actually fire which can mean it runs multiple seconds after the job starts.

I considered doing away with Tesla retries and just letting Oban retry the job, but we’re using Relay with these jobs, and so if I do that, Relay will broadcast a message saying the job failed even though it really just needs to be tried on a new Finch connection. I realize this is very in the weeds, but basically, these jobs don’t take a consistent amount of time to run, and there’s not a particularly easy way to make them take a consistent amount of time to run.

sorentwo · June 14, 2024, 12:08pm

That’s understandable. The most reliable way to handle this without exceeding the rate limit that I can think of is to increase the window to compensate for the variability, despite the fact that it limits your throughput. Or accept that the limits may not match up perfectly and handle the external rate limit violation gracefully from within the job.

Still thinking about how this could be modeled or even described accurately.

AHBruns · June 14, 2024, 6:23pm

Wrt modeling, my intuition would be to model job executions as time windows. Then you can use your existing sliding rate limiting window logic, except instead of asking how many jobs were started in the sliding window, you ask how many jobs had an execution window that overlapped with the sliding window. It seems to me this could be done in a similarly efficient fashion to your existing sliding window implementation.

From an API perspective, I think rate limiting, global limits, and local limits could/should all be unified into a common interface.

sorentwo · June 17, 2024, 2:44pm

It’s certainly possible and something to consider.

Do you mean internally or a user facing interface?

AHBruns · June 17, 2024, 6:52pm

I mean user facing. I think ultimately, all limits should come down to asking

What am I counting? E.g. the number of running jobs on a given queue, the number of jobs that have been started on a given queue, the number of running jobs on a given node, something else?
Over what window am I counting? E.g. a zero length window, a 10 second window, a 2 week window?
What is the max count I’m allowed to hit? Aka the limit.

Every one of the existing APIs can be framed in this context.

a local limit is counting the number of running jobs on on a given node, it’s counting over a zero length window (aka an instant), and the max count is supplied by the user
a global limit is counting the number of running jobs on on a given queue, it’s counting over a zero length window, and the max count is user supplied
a rate limit is counting the number of jobs started in any given supplied partition (or the queue if none is supplied), it’s counting over a user supplied window length, and the max count is user supplied

I have a use case to count the number of running jobs on a given partition over a non zero length window, but I could imagine lots of other useful combinations. Off the top of my head:

counting the number of running jobs on a given partition over a zero length window (aka a partition limit rather than a global or local limit).
counting the number of started jobs on a given node over a non-zero length window (aka a per node rate limit)

The issue comes up in defining the combinations. E.g. what does measuring the number of jobs started over a zero length window mean? If you model time in discrete ticks, then that’s pretty easy, it’s how many jobs were started in a given tick, but if you model time as continuous, then it’s kind of a meaningless question. Imo, saying that your API is only accurate down to some fundamental tick size is pretty reasonable, and makes everything coherent. In such a world a “zero length window” really means a window that is the length of 1 tick.