Phoenix Presence mailbox full

sb8244 · November 18, 2018, 9:34pm

I’ve recently been working on a load test for an open source push server that I’m building. One of the component of the load test has been testing different channel distributions.

When I run a topic using a unique ID, the test is able to max out my agents (64k conns per agent) without trouble (3 agents maxed out).

When I run a topic with a more constrained topic spread, I run into the Presence_shard0 timeout like this thread mentions. The example I have setup is hitting the ceiling very quickly with 100 topics, within 20k connections (1500 connections/s). The mailbox has over 70k messages usually when this occurs.

Nothing new to report on solution yet, but I do have this generically reproducible and easily tweakable. Unfortunately I’m in the process of open sourcing it, but I’m game to try out new ideas or share the problem over screenshare.

The software is DigitalOcean servers 8GB mem / 4vCPUs. When running the unique spread, My CPU maxes out at 50%. When running the constrained spread, I get 80%+ CPU.