Using Process.sleep for rate limiting external requests?

_jonas · February 8, 2024, 3:27pm

Hey I’ve got a quick beginner question:

I’m making requests to an external API using Finch.
The API is rate limited, and when the limit is hit, it returns a 429 response which helpfully comes with a Retry-After header that declares a time in seconds after which the request can be retried.

My naive approach would be to route all the external API calls through a GenServer, which when encountering the rate limit, calls Process.sleep/1 on the value (*1000) returned by the Retry-After header, so that all subsequent requests are queued in the message queue of the GenServer process and successfully complete after the process finishes sleeping and the rate limit is lifted.

I am hesitating because of this line in the Process docs for sleep/1:

Use this function with extreme care. For almost all situations where you would use sleep/1 in Elixir, there is likely a more correct, faster and precise way of achieving the same with message passing.

Is this a situation where sleep/1 would be appropriate or is there indeed a better way?

arcanemachine · February 9, 2024, 6:56am

I’m guessing by the lack of responses that this is one of those valid use cases.

From what I understand about the BEAM (which is not nearly enough…), making a process sleep like this is not a problem from an efficiency standpoint (i.e. the runtime can easily handle it).

I think that warning is to caution against naive time-based timeouts when there is a better way to do that. In this case however, it seems perfectly legitimate (to this newbie, at least) to do what you are doing since 1) there’s no more direct way of doing it, and 2) you know the exact amount of time you need to wait before performing the request again.

dwark · February 9, 2024, 7:29am

Wouldn’t that block the GenServer due to sleeping while handling a single API request?

Perhaps the GenServer could store the API call(s) to be delayed in a backlog (map) by some ID and use send_after to message itself to take API call with ID-x from the backlog and try again. That way, it would be able to handle incoming API calls instead of queueing them up in its message in-box. Assuming ofcourse those API calls are not all to the same end-point and/or do not fall under the same rate-limit…

arcanemachine · February 9, 2024, 7:51am

Cunningham’s Law saves the day again!

So is the GenServer needed at all? Or could it just be some function that calls itself after receiving a 429 and waiting for the timeout?

jswanner · February 9, 2024, 8:03am

I would not recommend using Process.sleep for this, if for nothing else there are bookkeeping tasks that are done under the hood that rely on message passing and you’re pausing all of that while sleeping.

I would use :queue to enqueue requests you want to have wait (storing the from value and eventually replying with GenServer.reply/2). The calling processes will be blocked the whole time (or until they time out), but you don’t need the GenServer to be sleeping for that

dwark · February 9, 2024, 8:04am

Probably, but I am not sure I would recurse by calling myself again unless its with a flag that says that it’s a retry, in which case it either succeeds or fails. Otherwise you may end up recursing endlessly depending on the responses you get …

_jonas · February 9, 2024, 11:25am

Thanks very much for the replies everyone!

Yes but that is my intention though, I was thinking of it like this:

This GenServer is responsible for making all calls for this particular API and nothing else. When the rate limit is hit, I do not want any new calls to be sent to this API until the Retry-After time has passed, so I explicitly want it to block.
From my basic understanding of the BEAM this would mean no calls would be dropped though, instead they would pile up in the message inbox of the GenServer process, and when the process unblocks they will be processed, I feel like this is ideal for what I want to achieve.

Assuming ofcourse those API calls are not all to the same end-point and/or do not fall under the same rate-limit…

Ah maybe that was a misunderstanding, all calls to this API fall under the same rate limit, the API rate limits based on my IP – therefore my wish to block and delay all calls when the rate limit is hit.

Ah this is interesting insight that I didn’t know / think about before, thank you.

Okay I understand this in principle I think, I’m just unsure how to implement waiting for the timeout returned by the header then.
After searching around a bit I found the Erlang :timer.send_after, I suppose that would be what I want?
So I would accept requests, always add them to a :queue, process the :queue and if the limit is hit, stop processing the queue and set a flag in the GenServer state that no new requests can be accepted and send a message to the GenServer with :timer.send_after which will trigger the flag to be set to false again and continue processing the messages again, correct?
This is a fair amount of added complexity in my head at least, but if those GenServer bookkeeping tasks are important I suppose it would be better than sleeping.

I’m thinking the GenServer is needed because I want to delay any other calls to the same API for the duration specified by the Retry-After header, rather than just adding a delay to a single call. This is why I would want to create this bottleneck that would not let any new messages through to the API while waiting for the time to pass.

dwark · February 9, 2024, 11:45am

Think @arcanemachine is right in that you probably do not need a GenServer for this.
Using something like:

    receive do
    after
      time_in_ms -> call_api(url, max_tries - 1)
    end

in the part that handles the Retry-After response would do the trick as well, no? And call_api(url, max_tries) would return an {:error, :max_tries} or something similar once max_tries reaches 0.

Assuming you have only one caller that calls call_api(url, max_tries) ofcourse.

_jonas · February 9, 2024, 12:46pm

Hm, this is a bit difficult to parse for me as someone new to the language, let me see if I understand correctly:

You are using receive which is used for processing messages from the inbox of the current process, except there are no clauses in the receive block, so it will just block the process until the time specified in the after block has elapsed correct?

Is this not the same as Process.sleep since receive will block the process until the time has elapsed?

But I understand that you are suggesting to introduce the waiting time in the caller, I don’t think this will serve me well since I want to coordinate the waiting between callers that could call from any process and have them all delayed until the rate limit is lifted.

I don’t want to drop or reject any calls under any circumstances, they should always just wait until the current rate limit is over.

No thats not a limitation I am willing to introduce, could be any number of callers from different processes.

dwark · February 9, 2024, 1:14pm

Correct.

Well, if you have callers from different processes then you probably want to go with @jswanner’s advice.

_jonas · February 9, 2024, 1:27pm

Great thank you!
So is this empty receive block thing with timeout a common pattern, and is it better than doing Process.sleep or does it block the process in the same way?
If I do this in my GenServer instead of Process.sleep, would it allow the GenServer to still do its bookkeeping tasks?
Sorry for all the questions, can’t really find any info on this pattern online.

For reference, here is my implementation with sleep which is called from inside the GenServer, works great so far but I am worried about the bookkeeping thing mentioned by @jswanner

  @spec dispatch_request(Finch.Request.t()) :: any()
  defp dispatch_request(request) do
    {:ok, response} = Finch.request(request, Backend.Finch)

    cond do
      response.status in 200..299 ->
        Jason.decode!(response.body)

      response.status == 429 ->
        Logger.warning("Paddle request rate limit exceeded!")
        headers = Enum.into(response.headers, %{})
        {wait_seconds, _} = Integer.parse(headers["retry-after"])
        Logger.warning("Waiting for #{wait_seconds} seconds to retry request.")
        Process.sleep(wait_seconds * 1000)
        Logger.warning("Wait period over, request will be retried.")
        dispatch_request(request)

      true ->
        raise("Paddle request failed irrecoverably #{response}")
    end
  end

dwark · February 9, 2024, 2:03pm

It’s basically the same thing except that Process.sleep also handles :infinity,
dunno if it’s a common pattern or not.

Your current method will block your GenServer from handling system messages under the hood: your GenServer code is just a part of it.

You probably want a GenServer state with a queue of some sorts (could be a simple list for lifo processing if you don’t care about the order in which they arrive versus are being handled) in combination with a wait flag.

If wait is true, simply prepend the incoming request to the list of pending requests (or use a :queue). If wait is false and queue is empty, process the request by calling dispatch_request/1. If queue is non-empty, queue the request and start processing the queue.

dispatch_request/1 should return either:

{:ok, response} or
{:wait, milliseconds} if it sees a 429 response.

In the first case, if queue is non-empty, you process the next entry.
In the latter case, you queue the (failed) request, set wait to true and use Process.send_after/4 to send yourself a :resume message at a later point in time.

Something like that anyway.

_jonas · February 9, 2024, 2:09pm

Alright, thanks very much for your time, I think I get the idea now

derek-zhou · February 9, 2024, 2:27pm

    receive do
    after
      time_in_ms -> call_api(url, max_tries - 1)
    end

I don’t recommend this, because then you cannot limit the concurrency of API calls, and that could make the API provider unhappy; especially if you are on free-tier or budget tier.

A GenServer with progressive Process.sleep after receiving 429 is ok for low traffic, one way (think web hooks) API calls.

dimitarvp · February 9, 2024, 2:51pm

I keep shilling for this technique regularly now: GenServer.reply: Don't Call Us, We'll Call You

You do not block the GenServer while at the same time you make the caller wait. It’s ideal IMO because the caller doesn’t have to invent extra logic to match on a return value and do Process.sleep; this pattern will just block it until the GenServer sends it what it needs (which it will not do immediately due to the rate-limiting constraints).

dwark · February 9, 2024, 3:06pm

Agreed, but that was assuming there were no concurrent calls …

jswanner · February 9, 2024, 5:09pm

For sure, it’s a matter of “quick and dirty” (Process.sleep), or more robust (GenServer.reply). It’s not just about the bookkeeping thing I mentioned, you might also want to introspect this process from time to time, to see how big the queue is or whatever, and you can’t do that if the process is blocking.

Another question: do you need a queue or can you load shed (drop requests)?

dwark · February 10, 2024, 8:39am

Very useful! Thank you, bookmarked for later use …

_jonas · February 10, 2024, 1:58pm

Yeah that’s fair, I just like the simplicity of using the message queue that already exists for the process, but it’s great to be aware of the tradeoff, I may refactor to this queue and reply method depending on the needs of the app.

No I want to absolutely never drop requests, but I’m also not expecting sustained high load, perhaps I can give some context:

This is for interacting with the Paddle API, which is a Payments processor for digital goods, I have a large inventory of items in my own database that I need to synchronize with the catalog in Paddle, creating and updating items there, but also occasionally fetching orders for fulfillment (this will be very low volume, like literally maybe once per day if even) and that would probably be initiated from a webhook.

The request limit is 240 per minute, and the product catalog I’m synchronizing is larger than that, so when synchronizing a full catalog, the rate limit will always be exceeded, and in that case I just want the synchronizing to pause for the full timeout (it’s always 60 seconds but I’m reading it from the header anyways) and then continue. And during that wait time all requests will just be forced to wait, they cannot be dropped.

So I know for sure that the limit will only be reached when I do a full sync of the catalog, which shouldn’t really happen a lot. If I happen to route more requests through there, I may upgrade to a more ‘solid’ solution with regards to rate limiting.

Anyways thanks again for the input everyone, I really appreciate it and definitely learned some things.

dimitarvp · February 10, 2024, 2:17pm

Oh, so if you exhausted your limit for the minute you have to wait all the way until the next minute starts?

Say you do 60 requests between 13:20:00 and 13:20:15, now you have to wait 45 seconds until 13:21:00 comes around?