This easy code eats system memory and kill VM - is there a way to protect against such memory leaks?

dmarko484 · January 15, 2018, 4:38pm

I have accidentally written the following module below. I just thought the first ‘perform(url)’ function, when called, will call the second one. Now when calling QueeJob.Url.performs("test") code will eat entire system memory and kill VM. I know that code is wrong, and crash dump points to fact garbage collector got crazy with this. This situation raised the question to me how one can fight potential memory leaks in app. We have a supervisors that help us when something crash. But is there a way how to protect against such memory leaks which kill entire VM?

defmodule QueeJob.Url do

    def perform(url) do
      perform({url,0})
    end

    def perform({url, sleep}) do
      :timer.sleep(sleep)
      IO.puts("URL processed: " <> url)
    end
end

pichi · January 15, 2018, 4:54pm

No, there is not a way to prevent you from this kind of bugs. If you need a highly reliable system you need at least two machines anyway. There is only one good advice there, don’t do this sort of bugs and test thoroughly.

merlin · January 15, 2018, 5:00pm

Oops … Does it mean you can crash the whole VM with properly forged parameters ?

I may use Dialyzer (don’t like the syntax) or guard clauses but there must be a way to limit recursion depth/mem per process or function ?

pillaiindu · January 15, 2018, 5:05pm

This code doesn’t look wrong to me, I’m not getting the point, what’s wrong with the code?
It froze my computer and I had to press the power button to shut it down.

would somebody please explain it to me?

mudasobwa · January 15, 2018, 5:07pm

@pillaiindu: First clause perform(url) always wins, the latter is never matched.

Guard when is_binary(url) in the former clause, or just swapping them would fix the issue.

outlog · January 15, 2018, 5:09pm

you can set a max_heap_size per process:

pillaiindu · January 15, 2018, 5:10pm

I’m still not getting the point.

I think the first function whenever called will call the second function and the second function will do it’s job.

why will the first function always win, if it is calling the second function explicitly?

sztosz · January 15, 2018, 5:15pm

The fist function takes one argument: url
The second function takes two argument url and sleep
And {url, sleep} is one argument and not two.

mudasobwa · January 15, 2018, 5:16pm

There is no such thing as “explicit call.” Both clauses have arity 1 and the former accepts literally everything.

peerreynders · January 15, 2018, 5:16pm

perform(url) will accept perform({whatever,whateverelse}).

They compile as separate clauses of the same function.

pillaiindu · January 15, 2018, 5:17pm

Yeah now I got the point!

made the point clear.

mudasobwa · January 15, 2018, 5:17pm

Nope, the second clause takes one argument {url, sleep}.

JEG2 · January 15, 2018, 5:18pm

The first function clause accepts any call with one parameter. The second function clause also expects one parameter: a tuple containing two elements. However, it’s never called, since the first clause already matched.

A good rule of thumb is to always place more specific pattern matches above less specific clauses.

sztosz · January 15, 2018, 5:18pm

True… I missed that That’s why this code is so tricky

pillaiindu · January 15, 2018, 5:18pm

He meant, one argument (the tuple) with two values (the url and sleep).

Edit: tuple, not struct

mudasobwa · January 15, 2018, 5:19pm

This is a tuple, not a struct. And it does not really matter how many elements it has.

sztosz · January 15, 2018, 5:19pm

I actually meant two arguments, but I;m glad you understand why this code does not work like a person who could write it would like it to work.

ryh · January 15, 2018, 5:20pm

Well, it does warn you that the clause won’t match because the previous clause always matches.

I’d use a URL struct or use a different function name.

A good idea is to start with the most specific match and less specific matches after (which is why it warns you about that :).

pillaiindu · January 15, 2018, 5:21pm

I’ll give the two functions different arities instead of using a guard clause.

merlin · January 16, 2018, 8:00am

Thanks, something to explore.
But how would you enforce these memory constraints on the code above ?
I’ve also found an interesting thread, not sure if it still applies though:

From Java Monitor - The Latest Java News

There are no mechanisms in the Erlang VM to curb the growth of the memory. The VM will happily allocate so much memory that the system shoots into swap, or that the virtual memory is exhausted. These may cause the machine to become unresponsive even to KVM console access. In the past we have had to power cycle machines to get access to them again.

The queue-based programming model that makes Erlang so much fun to write code for, is also it Achilles heel in production. Every queue in Erlang is unbounded. The VM will not throw exceptions or limit the number of messages in a queue. Sometimes a process stops processing due to a bug, or a process fails to keep up with the flow of messages being sent to it. In that case, Erlang will simply allow the queue for that process to grow until either the VM is killed or the machine locks up, whichever comes first.

This means that when you run large Erlang VM’s in a production environment you need to have OS-level checks that will kill the process if memory use skyrockets. Remote hands for the machine, or remote access cards is a must-have for machines that run large Erlang VM’s.

For people using Beam in production, does this still apply ?
If so, is there a way to circumvent that kind of behaviour ?