My team is looking to use Phoenix Presence in production in our upcoming project. We’d like to see if there is anyone who has used presence in a production system and to what scale they’ve used it at.
I’ve looked through this list of companies using Elixir & Phoenix however I’m unsure how many have actually used the presence feature. My assumptions are that shops like Dockyard and other consultancies have used it in production.
We use it but we only have a few thousand employees at most.
And I’m assuming it just worked for you as advertised? Any issues you came across?
Worked as I expect it to yep, not had any issues thus far.
I’m using it in a production app and have not run into problems scaling to so far, 10000s of concurrent connections
Thanks, this has been a big help.
@mgwidmann did you ever find more companies using presence in production? I’m considering using it for a production application as well and looking for more examples of people using it in the wild.
I did some quick benchmarks in digital ocean and found a with a cluster of 3 elixir elixir nodes I could handle 150k concurrent sessions with ~2.2GB of RAM used on each node and my elixir nodes were using ~50% CPU. I had each session in its own room (my production use case is very few users subscribed to the same topic). I could also check for whether a given user was online or not in ~17µs (this didn’t seem to change as I ramped up the number of connections).
For my use-case I think 150k users is plenty for the foreseeable future, but wanted to see if anyone had pushed it beyond that.
Detail of my benchmark setup can be found in this gist. I followed a similar pattern to the “Road to 2MM websocket connections” article
@mmmries would you mind posting the results in that gist too?
This is really great data, thanks!
I spent a moment staring at why the rate at which new users were connecting dropped off like that. I presume it to be because all the users connected?
The dropoff at the end is because all users had joined (I had to cap the number of users based on the number of client machines). The dip in the middle of the graph seemed to correlate to the tsung node maxing out its CPUs. I’m guessing it was busy recording latency information, error rates etc.
The CPU on the phoenix server was ~40-50% during the entire benchmark. So We are probably seeing a reasonable approximation of best performance, but it seems that it almost certainly could have done more.
BTW the probably of maxing out the client capabilities is a classic in load testing and explain 99% of the problems with it