I have an app that makes 2000 requests per second. To make such requests I use HTTPoison which uses hackney. Now each of this requests is fired from a specific pool with 10_000 max connections, so in theory it should be fine.
Problem
The problem is that all 8 CPUs are on fire, and hackney’s mailbox increases steadily overtime - this is translated into a steady RAM usage increase, until there is no more RAM available and bad things start to happen.
Research
To find out what is happening, I used observer and realized that the function with most reductions is hackney_pool:init followed by some others:
I am confused. I am not familiar with hackney_pool:init but I assume it initializes a pool (judging from the name) - which is even worst. I only create the pools once at the start, so why is this function being called all the time?
Also, is there a reason on why this is happening? Does it mean that my pool connections need to be bigger?
Each row in Observer’s Processes table represents a single process. The second column displays process names for named processes (e.g. code_server, hackney_manager) or the initial func called when the process started if the process has no name (e.g. hackney_pool:init/1, tls_connection:init/1). If you look at the Current Function column, you’ll see that most processes are different OTP behaviours: gen_server, gen_statem, etc.
Back to your question, hackney’s pool implementation has known faults. See https://github.com/benoitc/hackney/issues/510, https://github.com/benoitc/hackney/issues/549. It doesn’t look like hackney’s maintainer has enough interest or time to get if fixed properly, so I would advice on disabling hackney’s pool and setting up your own pool of processes where each process performs a single HTTP request.
Does the memory usage keep growing? So for example in the next minute would it have reason above 900 Mb? Is the number of processes growing? This you can see on the first System page.
If you are talking to different ones, then I would suggest getting rid of the pool altogether and just do individual requests that close the connection after the usage.
However, if it is the same website, then a pool is recommended to avoid reopening the same connections over and over again.
The other thing to consider is to actually reduce the number of max_connections. Having a very high limit actually makes things worse in many cases. For example, if the issue is that the upstream website cannot keep up with the load, opening more connections will make everything worse. This does not rule out the other advices in this thread though. It may be all of those factors combined.
Good idea. I am completely sure that the receiving website can handle the heat. It is quite tested by now, load shouldn’t be an issue for the receiving end.
However, the sending end (the one this thread is about) is suffering and I don’t yet know quite well how to fix it…
I am thinking about a mix of poolboy and HTTPotion (mainly because afaik HTTPotion doesn’t support pools) but as I said, I am distrustful of re-inventing the wheel all by my own.
Not sure if it’s helpful at this point, since you’re on the right track, but a while back I did some benchmarks with a friend, just to understand how much of a bottleneck the wire can be and these where the results:
All in all, the conclusion I drew were that spinning too many connections can be detrimental for the application, simply because the network can’t handle it. It’s not really about your machine or the server, but the actual network. Having a pool of workers which handles them for you should solve the problem I think