I have a staging application (thank goodness its not in production yet), and it started crashing my server because it was using too much memory. However, I’m not really sure why since I haven’t made changes for over a long time.
In my logs it says:
Aug 15 06:58:17 PM [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
Aug 15 06:58:17 PM [os_mon] memory supervisor port (memsup): Erlang has closed
Where should I be looking to figure out the reason causing this?
To be clear, the above does not mean your server is running out of memory. It just means Erlang tooling for measuring memory/cpu usage has terminated, which will always be logged when Erlang shuts down.
So, without further evidence, all we know is that Erlang is shutting down. Do your logs say something else? Do you have metrics that say something else?
OOM errors are hard to debug. :observer and LiveDashboard may help, However, when shit happens, it usually happen quick enough that you don’t get the chance to observe clearly.
I can only offer a few high memory pitfalls that I have seen:
Do you have process that do lot of work then idle for a long time? It may cause global binary not GC’ed soon enough. You can try to make those processes short-lived, or hibernate them.
Do you read and parse largish files? You may try to use :raw mode to open files and tune the read_ahead size.
Do you make a lot of sub-strings and keep them around for a long time? A sub binary will keep the original large binary from GC’ed. You can try to :binary.copy/1 them.
So, without further evidence, all we know is that Erlang is shutting down. Do your logs say something else? Do you have metrics that say something else?
I am using render, and I noticed the server going unhealthy and then dies then restarts. When I looked at the logs, those are the error messages I noticed before it restarts.
Open up the dashboard and you will be able to see if memory is growing, processes used, etc.
I have that installed, but the server had already been restarted by then so my up time was pretty short. I ended up actually upping the server ram and it seemed to be okay after that. Although, I don’t believe that is the right fix.
I’ll revert to the lesser ram tomorrow when its not being used and then try checking for the things you mentioned again.
Yeah, it’s a bit tough. In my case, it happens when I hit an API endpoint and I haven’t been able to find a culprit. It could be a long idle process, but if my server is restarting then that idle process would have died. Thanks for the helpful hints. I’ll try to look more closely.
Optimizing ram usage is usually not worth the effort, this is a compromise GC languages have.
There is one thing when ram usage spikes happen and another when there is memory leaking, and judging by your description, you most probably have a spike.
For the record, what are the specs of your machine?
Not for nothing, but this is what I find APMs (like AppSignal, Scout, DataDog, NewRelic, etc) great for. Doesn’t always give you what you need, but more often than not you can see what was happening when things went off the rails.
For the record, what are the specs of your machine?
512mb then I upgraded to 2gb.
There is one thing when ram usage spikes happen and another when there is memory leaking, and judging by your description, you most probably have a spike.
I just can’t imagine my small application spiking up to that point, so I figured I must have a bug. Although 512mb might be too small, what do you think?
I actually have New Relic on this staging machine. I tried looking around, but didn’t notice anything strange. I was rushing to fix it so might have need to look closer. I’m also not too familiar with New Relic, which page would you recommend that I look at?
Yeah, I like Rollbar too, although when I last used it it was just error reporting, not APM. I really like the “RSQL” or whatever they call it, their query language/tool.
Ah sorry, I meant it only for errors, yeah. I’m investing in writing code for ingesting anything and everything in OpenObserve lately, though it doesn’t have the ready-and-baked dashboards that f.ex. NewRelic has.