How to properly limit my own rate requests to external API?

managua1902 · October 10, 2023, 4:34pm

I use Oban to make external API requests every N minutes. It’s turned out that the frequnecy I use it with is too dense such that it’ll cause an error “too many requests” from API, as there’s a limit there.

How would I properly adjust my requests? I could a) incease N b) insert a few Process.sleep(10_000) inside the code that makes the requests.

An issue with (a) is that as my DB grows, it’ll have to be increasing N too, doing it by trial and error.

An issue with (b) is that the next iteration of Oban.perform(...) may coincide as the current one is getting executed, if Process.sleep(M) is too large. Otherwise, it may cause an API limit exceeded error.

How to do this properly? And in a simple manner too.

rlopzc · October 10, 2023, 4:37pm

You could discard the current request that got “too many requests” error.

Pattern match on the error body or HTTP status
If too many errors, return {:cancel, reason} to the Oban worker
The job will be canceled. Next request every N minutes will try again

This is a simple solution I can think of without much knowledge of the problem.

managua1902 · October 10, 2023, 4:38pm

It’s not a single API request per perform(...) but multiple ones. Namely, in each perform(...) I iterate over Db rows making multiple API requests.

mbklein · October 10, 2023, 4:49pm

I don’t know much about how Oban works under the hood, but you could see if there’s a way to integrate Hammer into the execution loop. It’s a rate-limiter for Elixir that acts as a kind of traffic signaling device.

aesmail · October 10, 2023, 4:57pm

It seems like you need to limit the requests between records inside the perform call.

Is there a way to ask Oban to start the next job N minutes after the current one finishes execution? Instead of scheduling it on a predefined interval?

Also, you need to maybe “pause” between each api call with within the perform function.

I always try to create a plain genserver for things like this since it’s easier to handle and reach out for Oban when I need to deal with more complicated scenarios.

managua1902 · October 10, 2023, 6:44pm

That would be overcomplication.

managua1902 · October 10, 2023, 6:44pm

I don’t know. Is there?

Eliminate Oban > Cron and instead insert a new job manually at the end of perform(...). Dynamically. Huh?

benwilson512 · October 10, 2023, 6:52pm

Can you elaborate on why you think this?

Oban Pro has rate limiting Smart Engine — Oban v2.11.0. That however operates at the job level not the API request level. However if these are pretty 1:1 that might work fine.

aesmail · October 10, 2023, 7:14pm

That might work. Combining this with setting a pause between each request could solve the issue. Having said that, there could be a better “perform” logic that might work better? Having one api request per job somehow? Querying only the number of records that is less than the threshold of the api and schedule the next job accordingly?

dimitarvp · October 10, 2023, 7:45pm

I’ve done something 99% the same long time ago, GenServer based, but I am not willing to dig it up at the moment.

You could write your own GenServer that is responsible for contacting the 3rd party API (when you send it a message) and have it preserve state that relates to how much requests you have left for e.g. the next 5 minutes, and only when you are about to hit a rate limit you do Process.sleep.

Furthermore, using a GenServer for this immediately rids you of any potential race conditions i.e. doing 2 or more requests just before you hit rate limit because sending messages to GenServer is serial and on a first-come-first-served (FIFO) basis.

Or you can use opq or :jobs (that one is in Erlang but fairly easy to use). I have used both successfully.

FWIW I am not a huge fan of Oban even though it works perfectly, I feel it confuses people and that’s why I never reach for it unless I need persistence for the jobs – which is a real requirement and you have that mandatory a good chunk of the time so maybe you’re gonna be better off just using Oban with uniqueness rules and maximum concurrency settings. That works quite fine as well.

managua1902 · October 10, 2023, 8:47pm

Additional dependency which hasn’t proven yet to even be required

elliotekj · October 10, 2023, 8:56pm

There are a couple of things that may work for you; it’s difficult to say what the best solution would be without knowing more about the specifics of the problem.

Yes, Oban supports scheduling, either after N seconds or at a specific datetime:

https://hexdocs.pm/oban/Oban.html#module-scheduling-jobs

If this is the route you go for, make sure you read the “Reliable Scheduling” docs to avoid some unexpected behaviour:

https://hexdocs.pm/oban/reliable-scheduling.html

Something else to look at is custom backoff:

https://hexdocs.pm/oban/Oban.Worker.html#module-contextual-backoff

Hope that helps.

RudManusachi · October 11, 2023, 1:28am

People have already mentioned Oban.

I’ll just point to a real world config example I came across recently:

github.com

notesclub/notesclub/blob/1da08eb06118217eee519c3b565c2a6b0d49e834/config/config.exs#L84-L87


      
          # Github REST API allows us to make 5000 req/h
          github_rest: [global_limit: 10, rate_limit: [allowed: 2000, period: {1, :hour}]],
          # Github Search API allows us to make 10 req/min = 1 req every 6 seconds
          github_search: [global_limit: 1, rate_limit: [allowed: 1, period: {10, :second}]]

managua1902 · October 11, 2023, 1:14pm

As I’ve explained, I’m using Oban in the cron mode. Therefore, it’s already scheduled, with repetition, via the config, statically.

managua1902 · October 11, 2023, 1:17pm

So Oban is capable of slowing down the requests on its own then? This will work for me.

However, what will happen if

the next iteration of Oban.perform(...) that switches in execution may coincide with the current one which is STILL getting executed

?

sorenone · October 11, 2023, 4:59pm

As we don’t actively proselytize, this is a marvelous compliment. We’ll take it!

sorenone · October 11, 2023, 5:02pm

Yes, only with Pro’s smart engine. In OSS; manually chaining jobs, edge cases, etc yourself.

NickGnd · October 13, 2023, 3:43pm

Hey
a bit late to the party, but I think is worth sharing this library Regulator which provides adaptive concurrency limits around external resources.

It does not integrate with Oban, but it might be worth a look to understand how they implemented the logic.

Cheers

dimitarvp · October 13, 2023, 5:10pm

Oh that’s a good one, bookmarked it right away but that’s not surprising because the author is fantastic.

baldwindavid · October 17, 2023, 3:25pm

You could also check out ExWaiter — ex_waiter v1.3.1

For better or worse, it’s not opinionated about how you keep track of whether requests can be made.