Soft rate-limiting incoming API requests / connections approach by slowing them down?

hubertlepicki · November 21, 2020, 2:19pm

Hey all,

I have a slightly unusual problem. One of our clients has a public API, that is built on Elixir / Phoenix / Postgres.

I am thinking of rate-limiting the access to API because we are getting hundreds of requests per second already and a lot of clients seem to have bult very lazy clients that just hit our API every second or so.

While I can fairly easily implement something like plug-attack, which would return 401 / 403 / custom status code instantly, and I actually tried this approach, this fails to prevent attempts to contact the API, and also breaks some of the customers code that consumes the API in a naive loops.

I wonder if I get myself into trouble by implementing a Plug or cowboy handler that pauses instead of returning status code instantly for these abusive clients.

So, when user goes above rate limit, we simply :timer.sleep(10_000) in a Plug or similar.

Has someone here tried such approach, or maybe knows of a tools that I can use in front of my API to facilitate this behavior?

I am slightly worried about doing that on the Elixir side, as this would mean maintaining these connections in memory, and also possibly hitting system limits on open file descriptors etc. But maybe it’d be simply OK.

rjk · November 21, 2020, 2:54pm

I think it’s perfectly fine to return 429 status codes if it’s really excessive from their side. It sort of should always be in place if you want to make sure you stay up for all your clients. It’s the only official back pressure you can use towards (REST) API clients.

That said, if you want to try out delaying the requests I agree it’s not smart to do it within elixir/BEAM space as you will have to tweak and monitor a lot before you’re really sure it works as intended. (You will drain max conns and tcp somaxconn buffers quite fast if you’ve got a fast client) At the same time it’s a very implicit way to deal with things and therefore very hard to debug for both sides (later on).

If you want to experiment with delays or other forms of rate limiting I would look into something like the following:

But I would still advise to go the 429 status route because it’s well known and very explicit.

Goodluck

chasers · November 21, 2020, 2:58pm

Yeah I think your concerns are valid here. In general it sounds like not such a good idea. This is odd behavior, and not explicit. You’re going to be getting clients who wonder wtf is going on when your API is super slow.

IMO, do it like you normally would and fix the your clients client. Or just make it so you can set each client at a different rate limit.

hubertlepicki · November 21, 2020, 4:09pm

@rjk this looks very promising:

I will investigate that thank you so much!

l00ker · November 21, 2020, 4:22pm

Here’s a link to a blog post by Chris McCord in 2017 where he discusses building API rate limiting via ETS. You might find some useful ideas there as well.

Optimizing Your Elixir and Phoenix projects with ETS

al2o3cr · November 21, 2020, 7:12pm

This is going to tie up a Cowboy acceptor process; if enough requests come in that trip the ratelimit, you’ll eventually end up with EVERY acceptor sleeping and everything else waiting.

The customer’s code is already broken, this is just making it more apparent. If they’re requesting data over the network, they need to be prepared to handle network failures (or at least retry).

carterbryden · November 21, 2020, 8:05pm

Like rjk said, rate limiting with nginx in front can be as easy as adding a line to your virtual host. You can rate limit by number of requests, with room for bursts if you want, or you can limit it by bandwidth.