Best way to handle 10s or 100s of thousands of API requests?

Hello, everyone!
I have an Elixir Phoenix 1.7 app deployed to fly.io. I am receiving 10,000+ API calls at a time that I need to receive, acknowledge as quickly as possible, then do a few API calls of my own, receive the information back, do something with it, and eventually report a status for each of the incoming API requests. I ran out of memory, and OK, I know how to pay more and add memory…
But now I’m wondering about the best way to handle this. I’m only able to receive each item one at a time, so I really don’t think I have any sort of context to be able to batch calls. I’m wondering if ETS is a good solution for tracking the status of each API call, or should I look into setting up a queue? This obviously is not my expertise, and I’m new to Elixir and Phoenix on top of that.
I would greatly appreciate insights from people who know better and have more experience.
Thank you so much!

1 Like

What mandates this? Are you positive this is the case?

The first step that comes to mind is determining whether data loss is acceptable.

While ETS is highly efficient, it does so by storing data in-memory meaning data could be lost. If data loss isn’t acceptable, then look into disc ETS aka DETS, Mnmesia which uses DETS, or persisting to a database.

On another note, to avoid rate limiting when making those subsequent API calls, take a look GenStage and Broadway which uses GenStage to add a backpressure mechanism.

2 Likes

I think so, yes.
It’s an integration with a vendor that basically sends one row at a time.
So, I have no idea if there is only one coming or half a million, if that makes sense.

@codeanpeace Thanks for this point. Data loss is not acceptable, so thanks for adding clarity.

Do you have to respond back with status within the same API request? Or is that done async later?

Good clarification. I ack the request ASAP, then return status elsewhere.

1 Like

Plausible Analytics does something similar, take a look at: https://github.com/plausible/analytics/blob/master/lib/plausible/event/write_buffer.ex

1 Like

per second? per minute? per hour?

Well if you want to absolutely positively respond to every request while having a very limited upstream 3rd party API, you either should make this async – as in, “Thanks for your request, we’ll email you the results when able” – or put all requests in e.g. Kafka and have background workers chewing through them (as much as the 3rd party API rate limits allow).

In general though, I can’t see how “I want to serve all requests” and “…but the data provider I am depending on is very slow” can be combined. Seems impossible on the outset.

As fast as the vendor can send them, so more on the order of “per second”.

Yes, that is what I’m doing. The vendor sends info, I acknowledge I got it, then set about getting the info from other API calls, and finally return a status asynchronously later.
I’m trying to avoid a Kafka type solution and looking to be a little sleeker and simpler, if that makes sense.
Most importantly, I’d like to understand a more canonical Elixir way of doing this type of work, which Elixir is supposed to be particularly good at.
At this point, I’m looking at using Oban for persistence and Mnesia for handling the queue. I’d love sanity checking this thinking, and all feedback here is really appreciated.

Thanks for this!
I didn’t realize Plausible uses Elixir!
Very cool!

Forget about it

That’s okay, but it won’t be able to handle 100k rps


  1. Do you want persistence (aka survive app restart)? If yes, use any persistent job queue. If not, use any in-memory storage solution (ets, rocksdb, memcached)

  2. Can this 3rd party service handle 100k rps? If not, you’ll need circuit breaker to drop requests which will overload your app

  3. Can you batch requests to this 3rd party service? If yes, use a job queue with batching

  4. Can you have multiple instances of this 3rd party service? If yes, can you use hashring routing?

  5. Is your API idempotent? Can you cache responses to popular requests? If yes, use Nebulex

2 Likes

As always “it depends”… I would probably use an external queue like rabbitmq and then process that with genstage/broadway.

Elixir (BEAM) gives you all the tools to do this all in the same program, but it is not necessarily easy to get it right. :slight_smile:

If Oban is quick enough for your purposes then that is also solid choice I think.

2 Likes

Please help me understand what I’m seeing here…
It looks like they catch an event, push it into a buffer, once the buffer starts getting full write to disk. I guess is basically the queue?
Meanwhile, they spin up GenServers as separate processes to handle the events in the queue asynchronously? They also handle adding to / removing items from the queue of things to handle?
I’d like to fully appreciate what I’m seeing here. :slight_smile:

@kwando Thanks for this response.
I was thinking “Why not just use Rabbit (or some other actual queue)?”
But was worried I’m missing something basic and sort of “cheating” and not doing things the Elixir way.
But if I understand you correctly, if I can expect to eventually receive millions of API requests per second, I’m better off just setting up a proper queue now. This way, even if my servers fail somehow, there is an external, very fast, reliable source of truth that the servers can come back up and resume working with.
Does that sound right?

Millions of request per second?! I don’t think whatever you are building now will be the same thing you are running at those levels… certainly not with a stock rabbitmq queue.

My thinking with an external queue is that building a durable and performant queue is not trivial so wouldn’t start by building one on my own.

Is those 10K+ rps you are talking about sustained load or bursts?

2 Likes

@hissssst Thanks for this!
Why forget about Mnesia? It won’t be able to handle this type of load? Wrong tool for the job? Just want to be sure I understand you.
I’m thinking through this a bit…

I do want persistence, so RabbitMQ?

The third party API can handle 100k rps, but requests no more than 100 requests across 10 threads per second. My problem here is that I don’t have context to know if there are more requests coming… I guess I could use a timer here and batch based on what comes in within the time limit?

I only reach out to the 3rd party via a RESTful API, and I only have one node at present. I’m reading up on “consistent routing”, so thanks for putting this on my radar.

My API is idempotent, but I don’t expect to frequently get requests for the same row. Plus, the information can change at any moment. So, what was true yesterday may have been updated. So, as I understand it now, I wouldn’t expect much mileage out of caching the data I’m fetching, unfortunately.

1 Like

Bursts at the moment, but we’re trying to move into handling a quarter of a million rows at a time by end of year.
Your points are well-taken.
I really appreciate your insights here, as well as the contributions from others.