Ecto/Poolboy queueing

tfischbach · July 13, 2018, 10:42am

I am using Tsung to load test a Phoenix channel that performs quite a bit of db queries when handling messages. When I increase the number of concurrent clients to about 5000 each sending a message every few seconds, I start to see that db queries take longer. Ecto logs “queue”, “query” and “decode” time individually. The delay is exclusively caused by the “queue” component. While the CPU load on the web host is high, the db host does not see significant load. Queue times increase to up to a few seconds.

Using the observer I can see that the MsgQ of the Repo.Pool process is filling up. Does anyone know

a) whether this is a sign that no more db connections are available in the pool? Increasing the pool size did not have a significant effect, though.
b) whether this could be caused by the high CPU load? Could it be that the scheduler does not assign enough CPU time to the Ecto.Pool process to handle its message?
c) what else might help to improve this situation?

Generally, I have the impression that handling messages via Phoenix channels is quite CPU intensive. Even with simple end points that only do trivial in memory computations, 10000 concurrent users quickly take up to 80% percent of the CPU resources on a server with eight cores.

I have found that disabling Ecto and Phoenix channel logging improved things. Even when the log level was set to warn, both still send debug messages to the Logger process even if they are silenced. The MsgQ of the Logger process was filling up.

Any other performance improvements others might have come across?

Thanks!

michalmuskala · July 13, 2018, 11:10am

What was the size of the VM run queue during the tests? This can indicate if the server was overloaded in general or is the problem local to the repo pool.

You could also try working with the sojurn pool - it should handle working during overload better.