Finding the cause of stucked beam

2Sjch8AT · January 8, 2020, 1:09pm

I have a program that creates multiple connections to MQTT servers. After a certain numbers of connections are created (>300), the Beam VM becomes frozen. When starting the observer before starting the connection, we can see the schedulers jump from ~30% utilisation to ~100% and the Beam VM is stuck enough that switching tabs on the observer or entering commands in iex is almost impossible.

I was able to generate an Erlang crash and read the states of the processes:

Process States when crashing (sum): 
===
      1 CONNECTED
      1 CONNECTED|BINARY_IO
    351 CONNECTED|BINARY_IO|PORT_LOCK
      1 CONNECTED|BINARY_IO|SOFT_EOF|PORT_LOCK
      1 CONNECTED|DISTR|PORT_LOCK
      1 CONNECTED|SOFT_EOF
      8 Current Process Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING
      8 Current Process Running
      1 Internal ACT_PRIO_HIGH | USR_PRIO_HIGH | PRQ_PRIO_HIGH | OFF_HEAP_MSGQ
      5 Internal ACT_PRIO_MAX | USR_PRIO_MAX | PRQ_PRIO_MAX
      1 Internal ACT_PRIO_MAX | USR_PRIO_MAX | PRQ_PRIO_MAX | OFF_HEAP_MSGQ
   2603 Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL
      1 Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING
    332 Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | IN_PRQ_NORMAL | ACTIVE | IN_RUNQ
      3 Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | IN_PRQ_NORMAL | ACTIVE | IN_RUNQ | SIG_IN_Q | ACTIVE_SYS
      1 Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | IN_PRQ_NORMAL | ACTIVE | IN_RUNQ | SIG_IN_Q | ACTIVE_SYS | SIG_Q
      1 Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | IN_PRQ_NORMAL | IN_RUNQ | SIG_IN_Q | ACTIVE_SYS
      1 Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | IN_PRQ_NORMAL | IN_RUNQ | SIG_IN_Q | ACTIVE_SYS | SIG_Q
      5 Internal ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | OFF_HEAP_MSGQ
      1 Running
    338 Scheduled
   2615 Waiting

What is PORT_LOCK? Is it bad?
I have access to the source of the application and its libraries sources, how can I track down the problem?

dimitarvp · January 8, 2020, 1:49pm

Two things spring to mind:

Too much GC (garbage collection) is happening.
(Less likely) the limit for opened files/sockets on the system is exhausted.

No idea about PORT_LOCK though, maybe somebody else will chime in.

entone · January 8, 2020, 3:54pm

What MQTT client are you using?

2Sjch8AT · January 8, 2020, 3:57pm

Thanks for asking. I am using Elixir Tortoise.

entone · January 8, 2020, 8:43pm

Are you actually connecting to 300 different MQTT Servers? If so, you definitely have an interesting problem.

If not, you should be able to use a single connection to handle many topics.

@gausby is the author of Tortoise, he may have some input as well.

gausby · January 8, 2020, 8:48pm

I have never tried connecting 300 connections with Tortoise…don’t know if some limit in the beam is hit here. I could perhaps try to ask some of my colleagues tomorrow if they know what could cause this.

But first off—are you connecting 300 instances of Tortoise to the same MQTT server, or do you have 300 MQTT servers running?

2Sjch8AT · January 8, 2020, 9:09pm

Hi, that’s great to have the author of Tortoise here :-). Thank you. I wanted to be sure I pinpoint the problem with some accuracy before opening a github issue.

I do have around 340 connections to 340 MQTT servers (running locally with mosquitto). While it may strange to read, each server is used to simulate a different IoT device. That’s why I’m doing that.

I would be please if you can ask your colleagues. In the meantime I will dig further on the issue. Thank you!

2Sjch8AT · January 8, 2020, 9:14pm

Today I tried to monitoring for long_gc and long_schedule but it did not triggered any messages.

All traces in the schedulers (except one) are in the unregister_match function called from the event module called from the connect function of the connection module.

entone · January 8, 2020, 9:25pm

I think you may be confusing servers with topics. Each IOT device should be a topic, not a server.

entone · January 8, 2020, 9:29pm

You may also be running into limits of your machine in general if you are running 340 Mosquitto servers.

To simulate load, I would have 1 MQTT server, a script that pushes messages to multiple topics, each topic could be a device/sensor combination, and then one Tortoise client connection to the MQTT Server, with wildcard topic subscriptions for each device id and sensor type.

You could write the script in Elixir and spawn a process for each “device” to simulate concurrent messages over the connection.

This would be closer to what you would be doing in a production environment, because your IOT devices won’t be talking directly to your service, but to your MQTT Server, which will send the messages on to your service.

entone · January 8, 2020, 9:35pm

And even if you don’t want to use wildcard subscriptions, you could still have multiple topic subscription on the single connection.

2Sjch8AT · January 8, 2020, 10:10pm

I see where you are pointing to but in our case each physical device has an embedded MQTT server to aggregate sensors data. The data are then read from a client on the device. So to simulate the cluster I need to simulate all of these MQTT servers.

2Sjch8AT · January 8, 2020, 10:11pm

So I checked that with htop and all is fine. Only the BEAM is stucked.

entone · January 8, 2020, 10:17pm

Interesting, so you have a VPN or some static IP for each device?

Do you have control over the devices themselves? If so, I would still write an aggregator script that pushes the messages from the local server up to a “cloud” server.

Or look at using an MQTT Bridge, http://www.steves-internet-guide.com/mosquitto-bridge-configuration/

2Sjch8AT · January 9, 2020, 10:10am

One IP for each simulated device, it is done with Linux network namespaces. We don’t have aggregator or a MQTT bridges in production so I also don’t want it for the simulation.

NobbZ · January 9, 2020, 10:41am

Wait? You are creating and destroying network namespaces dynamically in a large scale?

We had some similar setup up for some simulated network testing a few years ago.

We created and destroyed many hundred namespaces per hour, and the system call that should create the next one simply just stalled when it should create the nth namespace.

The OS thread responsible for this call just wouldn’t be scheduled by the OS anymore and wasn’t even kill -9able. Even worse, when the parent was killed, the stale child “survived”.

It was not possible to create any further namespaces when this occured. n was constant per machine across reboots, but different on a couple of hosts we tested.

Back then my personal funtoo machine survived the most namespaces, somewhere in the 10s of millions, where most other systems staled already in single digit millions.

We were not able to hunt down the root cause, as the affected client decided to just use a VM which gets thrown away after a couple of those iterations when we are still in a safe area of already created namespaces.

So if you really do work with a huge number of dynamically created and deleted namespaces, check if any of the described symptoms happen to you as well.

PS: We did not use erlang for that project, though it seemed to be a limitation of the OS, not of erlang.

2Sjch8AT · January 9, 2020, 2:39pm

That’s exactly what I am doing!
Funny that you are very active on this forum since I registered and that we experienced the same problem.

You are right that there are system limitations with network namespaces that I still need to explore. In this particular case however I think it is due to something else. The traces shows the code is stalled in the tortoise lib, not in the code creating the namespaces.

Today I replaced tortoise by emqtt and I was able to create a much higher number of connections (around 512). Then I got some errors with the network namespaces BUT the beam VM was not stuck this time.

So it shows me that something strange is happening with Tortoise in this setup.

OvermindDL1 · January 9, 2020, 3:09pm

Zombie process!

It happens because programs have a few states they can be in:

Active_Scheduleable
Sleep_Scheduleable
Wait_Scheduleable
Wait

Scheduleable means the process is in a coherent enough state so it can receive signals. If it’s not scheduleable then the process is waiting on something deep in the kernel.

The ‘Wait’ one generally only happen on very few specific kernel primitives, accessing low level resources like allocating new network data (not accessing it) is one. If the kernel can’t create it because it’s out of space for it then the program will wait until there are available resources. If the program is the one that caused the kernel to run out of the resources then it is a Zombie, forever dead and inaccessible as there isn’t a single signal in the entire system that can access it, not even kill -9.

2Sjch8AT · January 10, 2020, 2:11pm

@gausby did you get any info from your colleagues? I don’t have the problem with emqtt but would still love to use Tortoise. I will try to write a reproducible example next week.

garazdawi · January 15, 2020, 10:14am

2Sjch8AT:

      1 CONNECTED
      1 CONNECTED|BINARY_IO
    351 CONNECTED|BINARY_IO|PORT_LOCK
      1 CONNECTED|BINARY_IO|SOFT_EOF|PORT_LOCK
      1 CONNECTED|DISTR|PORT_LOCK
      1 CONNECTED|SOFT_EOF

These lines are actually Port states, not Process states.

The PORT_LOCK refers to the fact that the Port is locked using a port specific lock instead of a driver lock. This is normal and nothing to worry about.