I have an expensive CPU action that I want to serve through a public internet endpoint. It doesn’t matter what it does, just that it usually takes some time (seconds) and that it costs CPU (say a few percent of CPU usage). For a real use-case you can think of an expensive hashing algorithm like PBKDF or any other on a signin endpoint.
What I want to avoid is being hit by a flood of requests and suffer from resource starvation where everything would become unresponsive. Like the signin case, a public endpoint without authentication makes it a perfect target for attacks that want to bring the server down. Think that all network solutions are in place like rate-limiting, WAFs and so on.
I thought about 2 strategies in the BEAM:
1- Using a pool of processes. I’ve implemented this with poolboy and works fine but is hard to tune it. I have to benchmark the cost of the function in a production server and reach a pool size that will be a good enough “sharing” of resources. I am thinking about switching to wpool and having a bigger than needed pool with a callback module that would check CPU usage before dispatching to the pool. This seems to me very unreliable and prone to error… If we get several concurrent requests and the CPU usage from all of these is under control, then they would all start at the same time and the CPU would spike anyway…
2 - Using a slave node for doing just this operation and fight with the emulator flags to have it use other logical cores. Suppose I have 8 cpus available, I could dedicate 2 or 3 to the slave node and the rest to the main system. Though, with this strategy, it would still be vulnerable to resource starvation and make all calls to it fail which is not my intention here. I’d rather have timeouts than a denial of service.
So, I’d like to know if there are any other algorithms or strategies to deal with this. I appreciate your time
Have you thought about measures to prevent clients from misbehaving? For example with public signin pages, rate limits or tools like ReCaptcha can limit the impact an individual client can have.
I have thought about. Rate-limiting is applied but a DDoS would penetrate anyway. reCaptcha is also applied but, well, Google went down these days and brought reCaptcha with it… which made me think about this.
In anycase, I wanted to know if there are other papers on the subject or strategies to consider.
How much time can a user wait for his/her response? I mean if it takes 10 seconds normally for a process to finish, can the user wait like 10 minutes for it? Is it crucial for the system to do this calculation in a synchronized manner for the system to do other jobs?
I mean, if a user can wait a long time, then maybe you can queue these jobs and process them with a job processor.
This way, if the system is flooded, only the queue grows, CPU is not maxed out. You can monitor the queue length to see if your system is doing fine, or it is lagging behind. If you notice the system is not handling the load, then maybe you can scale your job processors for a limited time to clear the queue or scale your system to handle more load. You can also notify your users that you are under load. Maybe you cal also analyze your queue and remove some malicious jobs, if there are any.
You can do more fine-tuning too. Having multiple queues (high/low priorities), different job processors for them and somewhat easily horizontal scaling are to name a few.
There are a number of queue job processing libraries available for Elixir, a simple search reveals them.
This is similar to the process pool approach I believe. There the queue is the process mailbox and here is a persistent queue or an external system queue. In any case it is hard to tune it properly in different environments.
I wonder if there is some algorithm to auto-tune it according to available resources.
There is a subtle difference. The process mailbox is external, so it is more fault tolerant and more analysable.
I don’t know if there is an algorithm, but have a look at Horizon package that Laravel provides for its queue system. It has some kind of configuration that may help you or give you ideas.
Using an external queue system (be it RabbitMQ, Oban, Kafka or whatever) makes it a lot harder to have synchronous responses to reply to the end user. I could pass the PID of the request as an arg and send a reply later and so on or persist the result somewhere and check it later on, but I think this is a bit overwhelming and it doesn’t consider the environment I am running the requests as the meter to accept/deny more work.
What I want is closer to rate-limiting but should consider CPU and not requests per minute or something like that. I think I will try a custom rule from plug_attack…
Ah … yeah I had an API product for years that was designed to be async. So people would POST jobs with a callback URL. We’d hit that with results, or with an endpoint to gather results when the job was finished. It worked well and people understood why we had to do that based on the type of work we were doing.
Other than that, I would limit concurrency with a pool. So the whole system can have N number of concurrent jobs. You don’t even need poolboy for this … just a Dynamic Supervisor limiting the number of children.
I personally wouldn’t try and base it off of CPU. I’m sure it can be done I guess but I’d start with concurrency limiting I think to play with that. Maybe then have a failsafe based on CPU if you want to push the boundaries a bit more.
In that case I think you could just use a big enough pool (say 2 * core count) for low priority requests, and do the work directly in the request handler for high priority requests, bypassing the pool.
Reject low priority requests if the pool is full. You can also use s_broker if you want to get fancy and allow some queuing to deal with sudden spikes.
Even if the pool is saturated and CPU usage reaches 100%, the app will stay responsive, since the BEAM context switches very frequently.
Thanks! That is my current approach and it works quite well with the only caveat of having to “guess” the proper pool size for each environment.
The “normal” schedulers context switch very frequently but dirty_cpu schedulers might not do that (depending on the NIF implementation) so I am trying to be safe here and not ever allowing it to reach 100%.
I think I would have first used a “simple challenge” just to avoid “script kiddies” then your “annoying algo”. If you have a lot of users, to a specific cpu -> server -> hardware (GPU).
I have used jobs (https://github.com/uwiger/jobs) for these kind of things before. It allows you to have a queue which is regulated based on CPU sampling for example.
Or you can use a “circuit breaker” library to deal with this. They generally implement the correct algorithms for this. I’ve used fuse (https://github.com/jlouis/fuse) in the past but it hasn’t had any updates the last couple of years so don’t know if it will work with the latest erlang versions.
Awesome! Thanks a lot for the info! Will try to play with uwieger/jobs A very interesting library to have under my set of tools.
Thank you all for your replies. We have tested the process pool approach and it is handling it pretty good up to now. Though the subject keeps interesting me and I will try other approaches.