Profiling memory usage in given process

erlang
beam
profiling

#1

I’ve got some requests misbehaving, and processes handling them suddenly allocate a lot of memory, which can crash my Beam VM.

I can monitor and detect that, I can also enforce max memory usage by process, that’s not an issue.

I would like to, however, know what is eating up the memory. Ideally I’d be able to take a snapshot of memory used by current process, have it dumped to a file that contains Erlang terms, which later I could analyse and find out what’s on stack, what’s on heap etc.

Is there something out there I can use to achieve that?


#2

A few things occur to me as to why that may be happening:

cast them out!

Are you using cast ?
We had an issue sometime ago where we were flooding a process with messages via cast, which caused its mailbox to grow indefinitely causing it to crash the BEAM VM.

queues

Using queues?
Queues that grow without boundary (looking at you hackney) misbehave spectacularly, consuming all the system’s memory until a crash is inevitable.

https://ferd.ca/handling-overload.html

Tools you can use

If you want a specific tool, then I know of erlyberly, which shows you memory and messages passed between processes.

You can also find other tools in this discussion:


#3

I already solved the issue as in why and what is happening. It was this bug in absinthe we stumbled upon: https://github.com/absinthe-graphql/absinthe/issues/569

I’d like to know if there’s something that’ll help me out next time tracking down such and similar issues :slight_smile:


#4

I had a look at http://ferd.github.io/recon/ and Erlang’s internal tools but I don’t think there’s anything that will allow me to just have a glance at the memory and figure out what exactly eats it up…


#5

How ‘exactly’ do you want it? :etop.start() is pretty useful (make sure to use a new console) or observer. :slight_smile:

As for what is on the stack… unsure…


#6

That can come in handy although I don’t think it actually does what I want. I think you can do similar thing from observer if you sort processes by memory, but this is cool to have this in case Erlang on the server has no X support.


#7

You can run observer locally and connect to a remote node to view its information. Here’s a blog post about it: http://jbavari.github.io/blog/2016/03/11/using-erlang-observer-on-a-remote-elixir-server/


#8

Alternatively, have a look at observer cli. Its a terminal version of observer which is fantastic when no X is available (and actually for many cases I prefer its interface over the graphical observer anyways).


#9

But to respond to both of you and @axelson, this won’t help me when I precisely know which process is consuming the memory? I would like to know what is consuming the memory, and unless I am missing some functionality, observer does not allow me to have insignt into what’s in the memory itself


#10

:etop does tell you about the memory consumption of a process.

Problem though: If it is not the processes heap + stack but some references to the bin-heap, then you are lost. Space-Leaks on the bin heap are hard to debug due to its ref-counting nature. And even a single "a" which was produced by slicing into the gigabyte of JSON file can cause keeping the full JSON forever!


#11

Do you mean that etop gives me insight into memory consumption of the process (which it does) or can I also get insight into what consumes the memory? I can’t find the later


#12

It tells you only how much memory is consumed by a process.

I doubt there will be anything that tells you the actual what.


#13

Alright. The closest thing to what I want to do would be crashing the Erlang VM and producing a dump file that I could analyse but that’s pretty scary thing to do on a production environment, also the resulting dump will not be scoped to the process I identified but to the whole VM (which can be a good or bad thing depening on how you look at it).


#14

There are a few things you can do. Observer (and observer_cli) allow you to inspect what processes are currently doing. In observer_cli, if you ordered the processes by memory usage and you want to inspect the first one, simply press 1. You can then, for example, inspect the messages of the process or its state.

The same you can do also from iex. To get the state of a process use sys.get_state(pid). And via Process.info(pid) (or better Recon.info(pid) which disables some of the unsafe inspections) you can look into, for example, current stacktrace, process dictionary, messages etc.


#15

I would say, this is never necessary if you already know which process is causing the issue. As mentioned above, you can first start digging into the exact process. One thing to also look into, is garbage collection. You can use :erlang.garbage_collect to force garbage collection for processes. Especially if the process is long running, tuning garbage collection may be necessary.


#16

AFAIK, it’s not possible to get what you’re looking for out of the box. To get some clues, you could use a combination of the following:

  • memory usage of the process
  • process stack trace
  • message queue length
  • total binary memory usage
  • total ETS memory usage

In case you don’t know which process is the offender, you might also need to include :initial_call and :registered_name (both available via Process.info).

Coupled with the knowledge of the code, this should help you narrow down the problem.


#17

Right, but if a process is not a nice member of OTP family, as in it’s a Cowboy protocol handler (i.e. process in which Phoenix’s web requests are being handled), and memory is being allocated within the same process there is nothing that will give me insight to the allocated memory. That’s sad, I know Java has tools that can do precisely that, it’s a shame we don’t have the same :slight_smile:


#18

Maybe :erlang.system_info(:procs) is closer to what you are looking for? From it you can get the values that are currently live on the stack.


#19

The stack trace should help you narrow down the problematic code. We actually had a similar situation recently, and we ended up periodically logging the processes with unusually high mem usage, together with their stack traces. In this particular situation that info was enough to completely understand the root cause. I agree that it’s not quite what you’re looking for, but it can help you in understanding the problem.

It’s also worth keeping in mind that in a general case excessive mem usage can be caused by other things, such as messages accumulated in the process queue, or refc binaries, or ets tables, so finding the cause is not necessarily as straightforward as logging the current heap of the process, which is why I mentioned a couple of different things in my previous answer.


#20

Is there a way to get the information of :erlang.system_info(:procs) for only a single process/port?