So I was catching on keynotes backlogs when I got to Chris McCord’s ElixirConf 2017 keynote, and around the 27 mins mark, he talked about a performance issues with a plug that calls a GenServer.cast to log metrics.
Paraphrased, he mentioned something along the line of
the problem we’re seeing is these services was doing a GenServer.cast on every request to a reporter backend, while the reporter backend was batching the requests, but it was sending a message to a single genserver process on every request, so this crashed the VM under load and reduced the application to a single thread performance
Referring the slide
So, my questions are:
-
Whats the issue with the code above? I get the crashing the VM part as it could flood the process of the mailbox and potentially causing it to run out of memory, but how does this cause the application to slow down to a single-threaded application level? Shouldn’t the
GenServer.cast
be asynchronous and thus should have minimal impact on the requests? -
How should it be done instead? I’ve seen some suggestion on maintaining a pool of workers, but I don’t see how that would help since theres still a “reporting backend” to receive the message that would still be the bottleneck. (or really, I see how it would become the bottleneck for the metric collection, but I don’t really understand how it would become a bottleneck that affects the web requests since it runs on a separate process (ignoring resource consumption/process scheduling etc)).