I’m doing some optimization into an application that uses Riak Core, GenStage, and Phoenix PubSub. I’ve set up a load test that takes the application near the pod restart limit (using Kubernetes) and notice a large quantity of busy_dist_port
.
After some search, I found the +zdbbl
flag in Erlang page with the default value of 1024
. Notice that on Riak KV performance page, their default for Distribution Buffer is 32MB, so I’ve updated mine to 32 MB, and 99.9% of the monitor messages related to the busy_dist_port
are gone.
The monitor logs that remains are the following:
[info] monitor busy_dist_port <0.7799.0> [{initial_call,{'Elixir.GenStage',init,1}},{almost_current_function,{ets,match,2}},{message_queue_len,0}] {#Port<0.364>,unknown}
Where the almost_current_function
can have different values:
- {almost_current_function,{ets,match,2}}
- {almost_current_function,{erts_internal,dsend_continue_trap,1}}
- {almost_current_function,{ets,lookup_element,3}}
- {almost_current_function,{erlang,bif_return_trap,2}}
- {almost_current_function,{pg2,'-group_members/1-lc$^0/1-0-',1}}
Which I think is related to too much load, and probably nothing I can do.
The other one is for
[info] monitor large_heap <0.4933.0> [{name,'Elixir.pubsub_name.Adapter'},{initial_call,{'Elixir.Phoenix.PubSub.PG2',init,1}},{almost_current_function,{erlang,bif_return_trap,2}},{message_queue_len,771}]
With also similar different values in almost_current_function
.
- {gen_server,handle_msg,6}
- {'Elixir.Phoenix.PubSub',local_broadcast,4}
- {erlang,bif_return_trap,2}
- {ets,lookup_element,3}
Is there some recommendation in optimization that can be done that I’m missing? These should be the standard monitor messages when the server in under extreme load, right?
EDIT: The pool_size
used for PubSub in this project was set to 10
for a 2.5 core machine (2500m) when the documentation of PubSub recommends 1 partition per 4 cores. Change to a proper number (1) the large_heap
monitor logs produced disappeared.