Debug clients disconnecting from channels

I am doing benchmarks/load testing of an application based on WebSockets (clients are mobile apps sending GPS coordinates).

I am doing this by now with two laptops, one with the server, and another one with a script that simulates concurrent clients, say 1,000 or 2,000, using phoenix_client. (Please tell me if there is a better way to do this!)

While the server seems to be doing fine, my sockets monitor is telling me that some of these clients disconnect during the execution. Would it be a way to understand why?

@sb8244 hey! do you have any advice here perhaps?

Hm, my initial thought here would be to make sure that the heartbeat is going through, and that your load balancer isn’t killing connections (if you have one). Those could be silent killers.

I’d also look for any logs to make sure there’s no errors.

If you get a disconnection, does it automatically re-connect? Does the client-side think that it’s connected, but you know it’s not based on the monitor?

PhoneClient library is great for this, in my opinion. I’ve used tsung for high-load testing (60k+ connections), but it’s not friendly to work with and is just way more work.

@sb8244 Thanks!

Yeah, there is no load balancer (it is mix phx.server in my dev laptop, accessed from another laptop). And there are no errors in the logs. Wondered if this could be a known gotcha like “oh, that library in one single MBP won’t be able to juggle 2K or 3K sockets indeed”. In my production setup I’ll have as many sockets as mobile phones, so the purpose of this is just to get a very raw idea of the concurrency the server is able to handle in one node.

I’ll try to dig into this somehow.

What is PhoneClient BTW?

Sorry, I meant PhoenixClient. I think that got auto-corrected from my phone :laughing: The library you’re using

So the one thing I can think of locally is the local file limit. However, you’ll see that prevents new connections from being made, it should never kill off an existing connection. 2-3k sounds about right for the default that I’ve seen. I forget the exact commands here, but it is definitely something to look into if you see an issue in establishing new connections (vs dropping them)

2 Likes

@sb8244 You were right, the max number of file descriptors in the Mac was set to 256, which must be the default because this is a new MBP. The traces of the monitors did not follow a clear pattern.

This is going like a rocket now.

Thanks for your help, I am sure it would have taken me a lot of debugging time to realize this was not happening in Elixir itself.

1 Like

Happy to hear it’s working for you now!! :raised_hands: