Load testing: Struggling to get more than 50K-60K connections to a Phoenix Channel using Tsung

sb8244 · May 25, 2019, 8:46am

Just checked back into this and read the thread fully. Pretty awesome it is working well on VPS for you now! Sending the heartbeat is definitely important and the effects of not was seen previously.

One thing I didn’t grasp regarding k8s is whether you’re running tsung on k8s or your service on k8s. If tsung is on k8s then it’s possible that outgoing connections are hitting some type of artificial or real system limit that isn’t present with VPS (namely max outgoing port for tsung workers). If tsung is always on VPS then this would of course not be an issue.

praveenperera · May 25, 2019, 3:30pm

Good question I should have made that more clear.

The tsung controllers and workers are always running on VPS.

I have a little escript that I use to provision and setup (install tsung, setup hostnames and limits) on digitalocean. I use that whenever I run a load test, and then I immediately take them down after the test.

sb8244 · May 25, 2019, 4:03pm

Oh interesting then. So tsung configuration has remained static but the difference is so large.

You said that it’s not maxing out CPU/memory on k8s and I assume you’re not seeing errors. It could be worth logging out the number of created processes per second on the elixir side to see if somehow the heartbeat isn’t making it through? If you see a steady process created per minute but there’s a ceiling of active connections, it could indicate churn. If there’s no created processes after a certain point then it could indicate that the connections stop making it to the server after a point

praveenperera · May 25, 2019, 4:20pm

Yes that’s right I used the same configuration. I am not using heartbearts, instead I set the timeout on the phoenix side to :infinity.

I suppose I should try it again with heartbeats instead. That makes more sense than increasing the timeout.

It could be worth logging out the number of created processes per second on the elixir side to see if somehow the heartbeat isn’t making it through? If you see a steady process created per minute but there’s a ceiling of active connections, it could indicate churn. If there’s no created processes after a certain point then it could indicate that the connections stop making it to the server after a point

Thanks for the feedback I will try that as well. Now that I have a benchmark I am trying to optimize the actual app as much as possible on bare VPS. Once that is done I will go back to trying to figure out the issues with K8s.