Has anyone else observed throttling of Erlang or Elixir processes running in Kubernetes (or other Docker orchestration platforms)?
I’ve observed this in production a while ago (on a system I no longer have access to), but was unable to make headway. Originally, I noticed a wide variation of ‘ping’ times when connecting to a do-nothing endpoint, when the app was not under load, and then noticed throttling being recorded in Prometheus stats. I couldn’t seem to find a reason, nor create the problem in minikube etc. but then it was a very noisy/busy system, with 30-odd processes of various technologies and load per node.
Recently I came across what is possibly an explanation; this comes in two parts, one, a very detailed examination of CPU usage in the BEAM:
The second clue is talk about scheduler bugs in the Linux kernel, and what CFS quotas are supposed to do anyway:
It occurs to be that when running in a container under a CFS quota, we should all be setting
+sbwt none to avoid this optimisation from throttling the Erlang process.
Anyone come across this? Thoughts?