Best way to handle 10s or 100s of thousands of API requests?

Please help me understand what I’m seeing here…
It looks like they catch an event, push it into a buffer, once the buffer starts getting full write to disk. I guess is basically the queue?
Meanwhile, they spin up GenServers as separate processes to handle the events in the queue asynchronously? They also handle adding to / removing items from the queue of things to handle?
I’d like to fully appreciate what I’m seeing here. :slight_smile:

Then it is not idempotent.

Yes, it is the correct solution.

RabbitMQ can do this. There are other persistent queues and storages. You should take a look at rocksdb, redis, kafka, psql, etc. I’d even suggest using Nebulex with 2-level cache: one in-memory and other one on-disk.

Anyway, you should use storage only for persistence to recover from restart, keeping hot events in memory.

It is a hard dish to cook. You need to have some experience with it, since it behaves incorrectly in some edge cases. For example, it can just drop all the data (literally all, not just new) when your disk gets full. It has no master-master, it can be extremely difficult to recover the data after split

3 Likes

Yes, wildly optimistic best case scenario of millions per second.
So, thanks for your response.
Basically, we had a customer ask that turned into a “hold my beer” coding moment in which I thought, “OK, Elixir is supposed to be good for this kind of stuff.” So, I got it working at all, and immediately several groups want to throw hundreds of thousands of requests at it at a time.
So, you are exactly right what I am working on and have been working on is NOT that 1M/s solution. But, because of how this PoC has snowballed into an enterprise utility, I’m trying to get my head wrapped around what steps I can take now in the right direction.
I greatly appreciate your comments and help in trying to understand what this looks like.

1 Like

Thanks for this, @hissssst
This really helps me think about this the right way, while avoiding problems I haven’t even considered yet. Avoiding potential problems is invaluable. I can’t thank you enough.
So, the app is generic Phoenix and has a Postgres DB. It sounds like there is a bottom-line type of decision to weigh the different persistence and queuing options, but these tools are the right general direction. And I particularly appreciate

I need to think this through a bit, but it’s a nice generalization to have in mind.

I haven’t heard of Nebulex before this thread, so I will read up on that. Thank you for putting that on my radar.

Since there still seems to be some confusion, let me try my hand at clarifying:

Elixir is absolutely great, almost perfect, for the scenario you are outlining.

The problem however stems from your steep requirements:

  1. Raw speed. if you want every single request that comes to your app to be served in a timely basis (5-30s), and when we’re looking at 6-7 digits of requests per second, this becomes difficult even for uber-fast languages like Rust (and with some tuning, Golang). Not to mention the load balancers you need to put in front of all this since a single node can only occupy as much as ~60k network ports (there are workarounds but… long topic).

  2. Persistence. You can easily spawn a million Elixir processes each serving a single request (or pertaining to a single object or batch of objects) and it is practically the best runtime environment on the planet to handle this… BUT… if your VM node goes down, all these spawned tasks will disappear. If that’s undesirable then you should use Oban. And when persistence and atomicity get involved then you’re looking at drastically reduced throughput of requests (tasks) per second.


The part from the picture that I am still missing is this: are you supposed to receive normal HTTP requests and respond within, say, 5-30 seconds to each request? Or are your API endpoints simply supposed to ingest data coming from the outside (in the shape of requests), store them and eventually process them and send responses… somewhere else, not in a HTTP response?

I couldn’t get that part from your comments, and that detail can change the recommended solution quite a lot.

5 Likes

@dimitarvp , I really appreciate you adding more clarification.
Now it’s my turn.
My app sits between a CMS and another system.
I receive requests from the CMS, which sends them one at a time, regardless of whether the customer is sending one or many.
For each request, I have to look up some information (3rd party API calls), use that information for some business logic, then send out different API requests to a 3rd party, depending on the info I get back from my first outbound API calls.
Finally, I need to do some basic reporting about what happened with each request from the CMS.
The ask from customers is to handle 1 -250,000 requests at a time, and they are not a particularly large customer. So, if it works for them, then we bring on others like them (or bigger), I’d like to be sure I’m doing the best I can with the resources available.
I hope this explains enough, but I’m happy to provide more details.
And I really appreciate the time and attention here!

All elixir processes are queues as every process has a message box to receive messages and that message box is a queue. Genservers are the idomatic OTP pattern for a process serving a queue.

Given you stated that you are using a backend API it’s hard to say how batching will help unless that API supports batching. If you have rate limiting requirements then use of Req will be helpful as it’s based on Finch and maintains connection pools and can limit the number of requests.

It’s also unclear how you return the status back to clients and this will also have some bearing on the design.

There is no doubt Elixir/Erlang can handle many things and scale to great heights with clustering built in, but it can’t compensate for a poorly architected application.

You will need to carefully identify where things can block and queue up, e.g. does returning the actual result (as distinct from the initial ACK) back to clients happen on the same process or much later, possibly on a different process and via some webhook? Can those result messages to clients be batched or coalesced or is it 1:1 for every inbound request?

Your sending of results back to clients needs to ensure that this also doesn’t hold up processing of other requests that are using the backend API. Again this may or may not be an issue depending on the process and batching model you are using and if backend batching is even possible.

What is your strategy for when incoming work exceeds backend capacity?

At some point you have to introduce backpressure and “fail fast” semantics to preserve availability to the inherent capacity of the backend service, otherwise queues and memory will balloon and eventually the OS will kill the Erlang VM when it hits the OS process memory limit.

Even if you had infinite RAM, if the average inbound rate exceeds the nominal backend capacity then you have an impossible situation and must apply back pressure and refuse service to preserve the nominal capacity that the backend can actually service.

You also need to ensure your supervision tree handles failing workers due to unexpected errors such as clients and backend services failing or bugs so that failures are isolated and don’t impact other clients/requests so that your service is resilient and keeps on keeping on.

The “bee book” Designing OTP systems with Elixir is worth reading to understand how to artitect and think about Elixir OTP applications. Also the Elixir in Action book by Saša Jurić is good to gain a solid understanding of Elixir at a deeper level.

1 Like

In this case, i think you should use something like rabbitmq (any Queue provider) or oban for persistence and not lose any information.
ETS, mnesia can be a problem if the server goes down.
Oban can be a bottle neck too because the database pool.
I think the queue with brodway/genstage approch can be more effective in your case given the number of requests per second.
Maybe running the app in more than one node can help too.

2 Likes

Thanks for responding, man.

As @andrewh is saying here:

… and I am not clear on that as well.

Again: how is your app called exactly? Is somebody HTTP_GET-ing or HTTP_POST-ing requests to it and it has to reply synchronously? Or can it publish its results somewhere for the clients to consume later?

1 Like

It is a pleasure to get to have a back and forth about this!
You all are making me much, much smarter!

The CMS sends an HTTP POST to my server with basically one row of data. I receive and acknowledge it and send it to a task.

def rcv_msg(conn, params) do

    # Use Task to handle send msg in background
    # while returning ASAP
    Task.start_link(fn ->
      sendMsg(conn, params)
    end)

    ack = "msg received"

    json(conn, %{ack: ack})
  end

The task is where I call other APIs and process the information from the original post.
After processing, webhooks return statuses.

What happens if the task fails? How will the client know? How will the data be preserved? What happens if the remote service is slow, how will you avoid running out of RAM?

This is why most people are suggesting you put the data into some sort of high throughput queue as a synchronous operation inside each inbound HTTP request so that if other service has any issue you don’t lose data.

3 Likes

@andrewh Thank you for this thoughtful, thorough response!

Yes. Thank you for spelling it out this way.

Yes, that’s why I included that info. Batching isn’t much of an option, afaict. I am using Req! It’s a great library with high level and low level that I understand better the more I use it.

Sorry for the lack of clarity here. I responded to @dimitarvp on this already, but basically, I receive an HTTP POST, send the info into a Task, then return status later via webhooks.

Well said. This is exactly why I’m asking people who are smarter and have more experience.

It is a different process and later, but I am learning where (and how) things block. This is new-ish territory for me.

Thanks for this. I haven’t thought this through yet.

Yes! So, for back pressure, that seems like a job for GenStage? I really love how Elixir will fail then come back immediately and keep running. It’s so impressive. I get what you’re saying with “fail fast” semantics, I think, but I don’t know how to implement that. It seems I need an external persistence layer to handle if/when a node dies? I would really appreciate more on this topic.

Understood.

I need to dig in and understand this better, especially in terms of implementation specifics.

Thank you. I will look into these.

1 Like

Yes, these are exactly my worries and what I’m trying to get a sense of best practices around.
I will dive into the “putting data into some sort of high throughput queue as a synchronous operation inside each inbound HTTP request” today.
I do appreciate you spelling it out this way.

So, just to sort of sum up what I’ve gathered here:

  1. GenStage seems like the right tool for handling spikes in requests, providing back pressure, etc.

  2. But what about the persistence piece?

The GenStage documentation says:

producer, its main responsibility is to receive demand and generate events. Those events may be in memory or an external queue system.

In my case, it would be better to never drop a request at all, so what is the best way to handle this? Which tools and implementation details make the most sense?
In this thread, we’ve discussed RabbitMQ, and that seems like an acceptable starting point. As I understand it, this will allow processes to fail completely, then come back and sort of pick up where they left off. This will still not prevent loss from, say back pressure from the GenStages, right?
As I understand it, if the CMS POSTS to my server fail, it will automatically try to resend. So, I need to accept the POSTs and get them into the external queue as job 1.
Then, as @adw632 pointed out, I need to be sure processing doesn’t cause my server(s) to crash.

OR… what have other people tried that really didn’t seem to work for this type of scenario? Cautionary tales and caveats welcome!

If you are going down the GenStage path then maybe have a look at Stagger. It provides a simple persistent queue and you can avoid rabbitmq altogether. Also make sure you are using a persistent volume on fly so your file backed queue really is persistent.

For implementing rate-limiiting backpressure this post by Chris McCord might be useful.

You have a HTTP endpoint plug/controller that receives http requests, it just needs to implement the rate limit check and notify a producer with the job (which will implement persisting the job using stagger), your plug or controller can then respond to the client immediately or return a rate limit error as required.

The backend consumers request work from the producer and can synchronously do the backend API calls and whatever other blocking processing you have to do and then call the webhook when done. Again it’s unclear if you can coalesce the webhook calls, but this can just be another GenStage if needed.

Just ensure you have as many GenStage consumers as you need to meet the demand.

It probably doesn’t need to be any more complex than that unless it really needs to be.

The GenStage docs are also very good for understanding demand management.

2 Likes

Building something ad-hoc from various tools like GenStage, Stagger, Broadway, RabbitMQ, etc., would certainly work and could be an excellent learning opportunity. From a contrasting view, as some people have mentioned, you could do this with Oban starting now and scale into the 100/k range with little effort (many businesses are processing 10s of millions of Oban jobs daily).

Here are some things to consider (disclaimer, I’m entirely biased and not saying it’s the only way :wink:):

  • Horizontal scaling. Ingesting and handling that many events will require more than one system. Tools like GenStage are bound to a single node, meaning you’ll need a centralized queue to scale out. Oban solves this.
  • Partitioning. You have events coming from distinct users without any pre-defined batching. To effectively de-dupe and process those events, you’ll need your consumers to process events in arbitrary groups. Oban’s partitioned global limits and chunks solve this.
  • Fairness. Bursty or more active clients can drown out less active clients and prevent them from being processed in a timely manner. To combat that, you’ll want to rate-limit events by the client in some way. Oban’s partitioned rate-limiting solves this.
  • Retries. Sometimes, event processing will fail, and you don’t want to lose the events you were working on. You’ll need a retry mechanism and the ability to figure out what’s wrong. Oban’s retries, error reporting, and telemetry integration solve this.
  • Observability. What’s your throughput? Where are your bottlenecks? Are there errors causing this? Oban’s telemetry integration, logging, and the Web dashboard solve this.

Toss in pausing, scaling, testing, graceful shutdown, and the fact that you already have Postgres running, and Oban has a lot going for it.

9 Likes

So, looking at your response it seems that you don’t have to immediately respond with the results.

So that’s very good. Elixir with Phoenix (or only with Plug) can handle many thousands of such requests per second. We’re talking 50k rps or more, depending on the node you’re running the app on.

And since you don’t want to lose any running tasks then I think Oban is your best bet if you are looking at building this quickly.

While I would almost always recommend getting your hands dirty and learning – because such way of learning leaves lasting memories – in this case it seems you’re leaving money on the table if you don’t deliver this soon, correct?

If so, you’re also likely better off contacting someone on this forum (or from your network) to help you construct this and pay them an appropriate consultant / programmer fee (though your company should fund this but it’s unclear if that won’t involve too much bureaucracy).

EDIT: Thinking about this a bit more, if you opt for storing your tasks in RabbitMQ or Kafka – so as to avoid overwhelming your DB pool when the upstream CMS starts sending you tons of requests, which sadly is a legitimate worry – then you might get away with your homegrown solution i.e. (1) accept request, (2) store it in RabbitMQ / Kafka, (3) respond with OK and later (4) have background workers pick up the tasks from the message queues.

1 Like

@sorentwo Thank you so much for this.
I almost marked yours as “The Solution”!
Even though nobody said it, for some reason, I had it in my mind that Oban was a no-go because it wouldn’t scale, couldn’t handle so many requests, or something. Your post really makes the case for Oban in impressive fashion, and your list of considerations is so spot-on!

I am very glad to know this! If you could perhaps link any sort of documentation for how to achieve this, that would be greatly appreciated.

I am taking your advice and using Oban because of what you said here.
Many, many sincere thanks.

1 Like

@dimitarvp I can’t thank you enough.
I really appreciate you reading between the lines and balancing the correct measure of pedantic (not in a negative way, I just can’t think of a better word. You and @adw632 have both emphasized the learning process here,

which is the greater long-term value, and I appreciate that)…So let me restate this. Thank you for balancing near-term and long-term goals and values here.

100%. Good insight! This is a huge opportunity for me personally, as well as my employer.

I have learned so much from just the discussion in this thread! Absorbing what you all have been so kind to share here has got to be at least as valuable as wading into documentation and tinkering with code from the greatly diminished perspective I held before coming into contact with the thinking presented in this thread. Appreciating context and understanding the bigger “Why?s” behind choosing particular tools/libraries is inherently valuable and fuels strategic thinking.

I am. Considering what you and @sorentwo have said, Oban seems like exactly what I want.

Again, to all who commented here, I can’t thank you enough, and I am so impressed with this community.
Thank you all so much!

4 Likes

The article One Million Jobs a Minute with Oban should cover the baseline.

Separately, watch out for a Scaling guide in the official docs in the next week or so.

7 Likes