Profiling memory usage in given process

hubertlepicki · March 14, 2019, 2:23pm

I’ve got some requests misbehaving, and processes handling them suddenly allocate a lot of memory, which can crash my Beam VM.

I can monitor and detect that, I can also enforce max memory usage by process, that’s not an issue.

I would like to, however, know what is eating up the memory. Ideally I’d be able to take a snapshot of memory used by current process, have it dumped to a file that contains Erlang terms, which later I could analyse and find out what’s on stack, what’s on heap etc.

Is there something out there I can use to achieve that?

Fl4m3Ph03n1x · March 14, 2019, 2:30pm

A few things occur to me as to why that may be happening:

`cast` them out!

Are you using cast ?
We had an issue sometime ago where we were flooding a process with messages via cast, which caused its mailbox to grow indefinitely causing it to crash the BEAM VM.

queues

Using queues?
Queues that grow without boundary (looking at you hackney) misbehave spectacularly, consuming all the system’s memory until a crash is inevitable.

https://ferd.ca/handling-overload.html

Tools you can use

If you want a specific tool, then I know of erlyberly, which shows you memory and messages passed between processes.

You can also find other tools in this discussion:

hubertlepicki · March 14, 2019, 2:39pm

I already solved the issue as in why and what is happening. It was this bug in absinthe we stumbled upon: https://github.com/absinthe-graphql/absinthe/issues/569

I’d like to know if there’s something that’ll help me out next time tracking down such and similar issues

hubertlepicki · March 14, 2019, 2:41pm

I had a look at http://ferd.github.io/recon/ and Erlang’s internal tools but I don’t think there’s anything that will allow me to just have a glance at the memory and figure out what exactly eats it up…

OvermindDL1 · March 14, 2019, 2:43pm

How ‘exactly’ do you want it? :etop.start() is pretty useful (make sure to use a new console) or observer.

As for what is on the stack… unsure…

hubertlepicki · March 14, 2019, 4:23pm

That can come in handy although I don’t think it actually does what I want. I think you can do similar thing from observer if you sort processes by memory, but this is cool to have this in case Erlang on the server has no X support.

axelson · March 15, 2019, 9:04pm

You can run observer locally and connect to a remote node to view its information. Here’s a blog post about it: http://jbavari.github.io/blog/2016/03/11/using-erlang-observer-on-a-remote-elixir-server/

arnomi · March 17, 2019, 4:12pm

Alternatively, have a look at observer cli. Its a terminal version of observer which is fantastic when no X is available (and actually for many cases I prefer its interface over the graphical observer anyways).

hubertlepicki · March 18, 2019, 9:01am

But to respond to both of you and @axelson, this won’t help me when I precisely know which process is consuming the memory? I would like to know what is consuming the memory, and unless I am missing some functionality, observer does not allow me to have insignt into what’s in the memory itself

NobbZ · March 18, 2019, 9:16am

:etop does tell you about the memory consumption of a process.

Problem though: If it is not the processes heap + stack but some references to the bin-heap, then you are lost. Space-Leaks on the bin heap are hard to debug due to its ref-counting nature. And even a single "a" which was produced by slicing into the gigabyte of JSON file can cause keeping the full JSON forever!

hubertlepicki · March 18, 2019, 9:33am

Do you mean that etop gives me insight into memory consumption of the process (which it does) or can I also get insight into what consumes the memory? I can’t find the later

NobbZ · March 18, 2019, 10:26am

It tells you only how much memory is consumed by a process.

I doubt there will be anything that tells you the actual what.

hubertlepicki · March 18, 2019, 10:35am

Alright. The closest thing to what I want to do would be crashing the Erlang VM and producing a dump file that I could analyse but that’s pretty scary thing to do on a production environment, also the resulting dump will not be scoped to the process I identified but to the whole VM (which can be a good or bad thing depening on how you look at it).

arnomi · March 18, 2019, 11:50am

There are a few things you can do. Observer (and observer_cli) allow you to inspect what processes are currently doing. In observer_cli, if you ordered the processes by memory usage and you want to inspect the first one, simply press 1. You can then, for example, inspect the messages of the process or its state.

The same you can do also from iex. To get the state of a process use sys.get_state(pid). And via Process.info(pid) (or better Recon.info(pid) which disables some of the unsafe inspections) you can look into, for example, current stacktrace, process dictionary, messages etc.

arnomi · March 18, 2019, 11:53am

I would say, this is never necessary if you already know which process is causing the issue. As mentioned above, you can first start digging into the exact process. One thing to also look into, is garbage collection. You can use :erlang.garbage_collect to force garbage collection for processes. Especially if the process is long running, tuning garbage collection may be necessary.

sasajuric · March 18, 2019, 1:16pm

AFAIK, it’s not possible to get what you’re looking for out of the box. To get some clues, you could use a combination of the following:

memory usage of the process
process stack trace
message queue length
total binary memory usage
total ETS memory usage

In case you don’t know which process is the offender, you might also need to include :initial_call and :registered_name (both available via Process.info).

Coupled with the knowledge of the code, this should help you narrow down the problem.

hubertlepicki · March 18, 2019, 2:36pm

Right, but if a process is not a nice member of OTP family, as in it’s a Cowboy protocol handler (i.e. process in which Phoenix’s web requests are being handled), and memory is being allocated within the same process there is nothing that will give me insight to the allocated memory. That’s sad, I know Java has tools that can do precisely that, it’s a shame we don’t have the same

garazdawi · March 18, 2019, 3:02pm

Maybe :erlang.system_info(:procs) is closer to what you are looking for? From it you can get the values that are currently live on the stack.

sasajuric · March 18, 2019, 4:26pm

The stack trace should help you narrow down the problematic code. We actually had a similar situation recently, and we ended up periodically logging the processes with unusually high mem usage, together with their stack traces. In this particular situation that info was enough to completely understand the root cause. I agree that it’s not quite what you’re looking for, but it can help you in understanding the problem.

It’s also worth keeping in mind that in a general case excessive mem usage can be caused by other things, such as messages accumulated in the process queue, or refc binaries, or ets tables, so finding the cause is not necessarily as straightforward as logging the current heap of the process, which is why I mentioned a couple of different things in my previous answer.

arnomi · March 18, 2019, 9:08pm

Is there a way to get the information of :erlang.system_info(:procs) for only a single process/port?

Profiling memory usage in given process

cast them out!

queues

Tools you can use

`cast` them out!