Debugging and measuring WebSocket communication

mvrkljan · April 2, 2020, 9:25am

Hi everyone,

I’m running into a few hard-to-analyse issues with WebSockets using Phoenix, so was just hoping someone might shed some light on a few things:

Is there any way to measure how long it takes for a message sent over a Channel to get processed before replying back to the client? For context; I’m working on an app that measures connection speed using WS and large messages, and was wondering if size of the message significantly impacts processing time on the server,
Is there a limit to how many processes are spawned to handle WS connections and if so, can you please point to a resource where I can read more about it, or how to configure it?
What mechanisms are available to tackle situations where messages sent via WS are coming in faster than they can be processed?

Thank you!

SophieDeBenedetto · April 2, 2020, 10:43am

Hi @mvrkljan! There are a few Telemetry metrics that Phoenix emits for Socket/Channel events that you might want to check out.

Phoenix executes a Telemetry event when the socket is connect here
Phoenix executes a Telemetry event when a channel is joined here
Phoenix executes a Telemetry event a a channel processes handle_in, and that event includes the duration of the handle_in event, here

That last event sounds like it would be the most helpful to you. Not sure if you’ve worked with Telemetry before but it is pretty cool! This Erlang Solutions blog post explains it pretty well https://www.erlang-solutions.com/blog/introducing-telemetry.html and I’m working on a series of articles for Elixir school on how to use Telemetry. They’re not published yet but you can check out the drafter here (feedback welcome!)

I’m no expert on benchmarking Phoenix, and you’ve likely seen this before, but this article talks about some benchmarking practices. Tsung’s benchmarking tool, mentioned here, is pretty easy to work with IMHO.

W/r/t how many processes are spawning to handle incoming WS connections–the benchmarking article linked to above demonstrates an example in which the system process limit is reached, which I would say is more likely to happen before you reach some sort of Phoenix or Elixir process limit.

Def curious about what feature you’re supporting where you feel that WS connections are maxing out though.

mvrkljan · April 2, 2020, 2:33pm

Thank you @SophieDeBenedetto!

I’ll definitely look into taking advantage of Telemetry, and thanks for the benchmarking article link.

We’re just noticing situations where a reply from a server via WS takes a while when multiple users are using our app/speed test, but we’re not seeing our CPU usage spiking or getting even close, so my initial thoughts were that we might be hitting some limit or running into a processing queue issue. We’re still reviewing our implementation, so we’ll see what we find out.

Thanks!

benwilson512 · April 2, 2020, 2:43pm

Each websocket connection is spawned into a different process, so if your schedulers aren’t pegged then it isn’t likely to be direct interference between connections. However, be sure to check things like database pool contention if your sockets make DB requests.