Why is hackney consuming so much resources?

Fl4m3Ph03n1x · January 30, 2019, 8:41am

Background

I have an app that makes 2000 requests per second. To make such requests I use HTTPoison which uses hackney. Now each of this requests is fired from a specific pool with 10_000 max connections, so in theory it should be fine.

Problem

The problem is that all 8 CPUs are on fire, and hackney’s mailbox increases steadily overtime - this is translated into a steady RAM usage increase, until there is no more RAM available and bad things start to happen.

Research

To find out what is happening, I used observer and realized that the function with most reductions is hackney_pool:init followed by some others:

Also, the memory graph which indicated the mailbox is rising:

Questions

I am confused. I am not familiar with hackney_pool:init but I assume it initializes a pool (judging from the name) - which is even worst. I only create the pools once at the start, so why is this function being called all the time?

Also, is there a reason on why this is happening? Does it mean that my pool connections need to be bigger?

alco · January 30, 2019, 12:18pm

Each row in Observer’s Processes table represents a single process. The second column displays process names for named processes (e.g. code_server, hackney_manager) or the initial func called when the process started if the process has no name (e.g. hackney_pool:init/1, tls_connection:init/1). If you look at the Current Function column, you’ll see that most processes are different OTP behaviours: gen_server, gen_statem, etc.

Back to your question, hackney’s pool implementation has known faults. See https://github.com/benoitc/hackney/issues/510, https://github.com/benoitc/hackney/issues/549. It doesn’t look like hackney’s maintainer has enough interest or time to get if fixed properly, so I would advice on disabling hackney’s pool and setting up your own pool of processes where each process performs a single HTTP request.

rvirding · January 30, 2019, 2:43pm

Does the memory usage keep growing? So for example in the next minute would it have reason above 900 Mb? Is the number of processes growing? This you can see on the first System page.

Fl4m3Ph03n1x · January 31, 2019, 8:28am

Yes, the memory keeps growing. However the number of processes seems to be stable at arround 24% - 26%. What does it mean?

The MsgQ for hackney_pool:init does seem to grow without bound however.

Mmmmm, would you recommend any articles on doing such? I am usually weary of re-inventing the wheel all on on my own :S

josevalim · January 31, 2019, 9:14am

The first question is: are you talking to different websites or always the same ones?

Fl4m3Ph03n1x · January 31, 2019, 9:16am

To the same one. The thing that changes with each request is the parameters.
However I do fail to see how this would matter. Could you elaborate?

josevalim · January 31, 2019, 9:21am

If you are talking to different ones, then I would suggest getting rid of the pool altogether and just do individual requests that close the connection after the usage.

However, if it is the same website, then a pool is recommended to avoid reopening the same connections over and over again.

The other thing to consider is to actually reduce the number of max_connections. Having a very high limit actually makes things worse in many cases. For example, if the issue is that the upstream website cannot keep up with the load, opening more connections will make everything worse. This does not rule out the other advices in this thread though. It may be all of those factors combined.

Fl4m3Ph03n1x · January 31, 2019, 9:29am

Good idea. I am completely sure that the receiving website can handle the heat. It is quite tested by now, load shouldn’t be an issue for the receiving end.

However, the sending end (the one this thread is about) is suffering and I don’t yet know quite well how to fix it…

I am thinking about a mix of poolboy and HTTPotion (mainly because afaik HTTPotion doesn’t support pools) but as I said, I am distrustful of re-inventing the wheel all by my own.

Manzanit0 · February 4, 2020, 10:59am

Not sure if it’s helpful at this point, since you’re on the right track, but a while back I did some benchmarks with a friend, just to understand how much of a bottleneck the wire can be and these where the results:

gist.github.com

https://gist.github.com/Manzanit0/ccc8419319cca1fa7e7486bfda5e3885

pooled_crawler.ex

# Dependencies:
#      {:httpoison, "~> 1.5"},
#      {:floki, "~> 0.21.0"}
#      {:benchee, "~> 1.0"} (Only for benchmarking – not in the script)


defmodule CrawlQueue do
  use Agent

  def start_link(urls) do

This file has been truncated. show original

All in all, the conclusion I drew were that spinning too many connections can be detrimental for the application, simply because the network can’t handle it. It’s not really about your machine or the server, but the actual network. Having a pool of workers which handles them for you should solve the problem I think

Manzanit0 · February 4, 2020, 11:00am

LOL, we’re in 2020. not 2019. I don’t know why I thought this was just this month… Nevermind me