Help diagnosing 50x responses from a Phoenix app

Hi,

We’ve been hosting a phoenix application on AWS behind a load balancer for a number of years without issue. This acts as a server for our website and booking system.

Yesterday, for a period of 3 minutes, the site stopped responding, and after checking AWS logs, the load balancer responded with a combination of 502, 503 and 504 responses. Our phoenix application reported no errors.

After checking more AWS logs, it appears our CPU usage spiked to 95% just before the errors started being returned. Is there any way I can diagnose what caused this spike? The phoenix app reported no 50x responses in its error logs, so I’d assumed it was something at the load balancers end, but after looking at the CPU usage I’m guessing this is the cause?

Any help on the best way to diagnose CPU usage (if this is indeed the issue) is greatly appreciated

I have my doubts that your app is the culprit, but it can happen.

I would check the telemetry if you didn’t have a huge inrush of clients or a ddos attack, as phoenix has a mechanism of dropping requests when it is no longer feasible to respond to them in time (I’m not entirely sure that it’s documented somewhere, as I’ve heard about it on this forum from @josevalim).

Yeah a 502 from a load balancer generally means “I can’t get a response from my backend servers”. Regarding the high CPU, unfortunately causes are going to be extremely varied. As someone already suggested, check the request rate and make sure you weren’t getting a request flood. Beyond that, hopefully you’re logging or recording enough system telemetry to see what’s up.

The other thing to check would be memory usage. I’ve seen cases where if you get some some sort of excessive multi-gigabyte processes that’s growing rapidly the garbage collection can use a LOT of CPU.

Finally, look at operations that may use NIFs or other explicitly CPU intensive operations. These are commonly things involving encryption like password hashing or any other sort of explicit crypto operation.