Hi all, I’m encountering an issue with a production app I’ve inherited and I’m having trouble getting to the bottom of it.
The app is an API that communicates with a JS client via both HTTP and WS. It’s a speed test app and the usual action flow from users involves a few HTTP requests (logs and results storing) and a few WS connections to three different servers running the same Phoenix app. The WS connections are used to measure speed using a high number of small packages which are sent to and from the server (depending on whether download or upload speed is measured) as messages on WS channel during a 10 sec period.
Everything worked OK until we started observing what happens when there are 10-15 concurrent users using the app. When we reach that point, all HTTP responses and WS connections take around 15 seconds to resolve. Even the simplest HTTP endpoints which don’t hit the DB and which are usually lightning fast are delayed.
I’m trying to figure out if there’s a connection pool limit or something similar we’re hitting, or if there’s anything that might cause these requests to end up queued in a process’s message queue. Is the number of WS messages being transferred somehow interfering with HTTP request processing?
I’ve attached a console to the running deployed app using --remsh and tried to inspect the running processes when we reach these peak times when everything slows down. While most of the processes with high message queue numbers seem to be processes handling websockets, I’m having trouble using
Process.info() to figure out why everything is responding so slow.
Has anyone encountered anything similar or has any suggestions on how to proceed with investigating this?