Strange behavior of :erlang.trace/3 listening for garbage collection messages

So we’ve have been seeing an issue with memory benchmarks in Benchee recently for functions that use very little memory. I think I have some idea of where the issue is, though, but I’d love some help from anyone that might have any ideas on how to resolve it.

For a quick overview of how we’re measuring memory usage of a function, you can check out a writeup here: http://devonestes.com/using-erlang-trace-3

So, while looking at memory usage data for a function that use just a little bit of memory, we’re not getting any tracing messages that indicate garbage collection is happening. And yet, when I check the process’s garbage collection info immediately after executing the function (using Process.info/2), I see this:

[
  old_heap_block_size: 610,
  heap_block_size: 376,
  mbuf_size: 0,
  recent_size: 100,
  stack_size: 14,
  old_heap_size: 58,
  heap_size: 177,
  bin_vheap_size: 0,
  bin_vheap_block_size: 46422,
  bin_old_vheap_size: 0,
  bin_old_vheap_block_size: 46422
]

The fact that the heap_block_size has been increased from 233 to 376 words, and that there’s an old_heap that’s been allocated and now has data on it, indicates that garbage collection happened, but our tracer process was never sent a message. I tried flushing out the process mailbox for our tracer to see if maybe we were just missing the message with pattern matching that was too selective, and there was still nothing in there after the function was run. I also tried on a much larger function that generated a lot of data, and I did see plenty of garbage collection messages sent to our tracer in that case, both minor and major GC runs.

So, is it possible that the tracer is configured incorrectly in a way that would lead us to be missing tracing messages? When we set the tracer we’re using :erlang.trace(pid, true, [:garbage_collection, tracer: self()]).

This would explain the occasional negative measurements we’ve seen, as there might have been some data on the heap that was collected away, but we were never received any messages about it. Also, this behavior is showing up in both OTP 20 and 21, but it appears to be more pronounced in OTP 21.

Any further ideas or help would be very appreciated!

4 Likes

Might be good to make a minimal reproducible test-case in Erlang and submitted it to the OTP tracker…

1 Like