Memory leak with quantum and processing binary files

dfens · March 6, 2020, 4:01pm

Currently I have phoenix app which also schedules quantum jobs (GitHub - quantum-elixir/quantum-core: ⌚ Cron-like job scheduler for Elixir) with overlap: false mode
Jobs are importing & exporting relatively large json files.

Now my problem is
Screenshot from 2020-03-06 16-29-16

Leaking memory

This is current status and it will get worse since different/bigger data sets are going to be processed.

For example on start on the left app starts with ~60MB
Screenshot from 2020-03-06 16-34-47
Flat line in the end is after quantum job was processed

I did check also this memory management - Solving large binaries leak - Stack Overflow but forced gc / tuning ERL_FULLSWEEP_AFTER collect did not seem to help

If I run ImportTask/ExportTask on my local machine but not as quantum and observe memory - then I cannot reproduce this behaviour

Any tips welcome

NobbZ · March 6, 2020, 4:02pm

Have you tried strings: :copy (assuming you use Jason)?

benwilson512 · March 6, 2020, 4:15pm

Have you tried calling :erlang.garbage_collect() at the end of whatever function you’re having Quantum run?

dfens · March 6, 2020, 9:26pm

@NobbZ jackpot!
as the matter of fact I was using Jaxon for parsing since files were pretty big before. but just tested with Jason strings: :copy and issue does not occur anymore. I suspect this was misuse of Jaxon.stream on my side but I am completely satisfied with Jason results

@benwilson512 yes I did try, without any effect

Thank you so much for your input guys

NobbZ · March 6, 2020, 9:30pm

From a quick glance it seems as if jaxon wasn’t copying and also did not provide an option to copy binaries.

You might still be able to use jaxon when you :binary.copy/1 strings it gives you manually.

dfens · March 19, 2020, 2:16pm

Basically I have let’s say 100MB file

What I was doing was like:

 path
           |> File.stream!()
           |> Jaxon.Stream.query([:root, :all])
           |> Stream.chunk_every(@chunk_size)
            |> Enum.map(... process chunk)

now when process did finish there was like XXXMB leftover per each iteration. Anyway what I am searching for is kind of stream parser and it seems that code below is misuse because memory spikes immediately to ~1.5GB and drops sequentialy with each chunk (not as stated with jaxon)

NobbZ · March 19, 2020, 2:28pm

That is probably because of Enum.map, depending on what it does exactly, it will force the full stream.

Also if you keep references to binaries in that Enum.map instead of using :binary.copy/1 as suggested, you might keep all the lines in memory until the sub references get collected.

But overall, this is hard to say without knowing how you process the chunks…

dfens · March 19, 2020, 2:35pm

I suspected enum map but it does only this:

Insert chunk into postgres (let’s say ~1000 items with subresources)
returns struct %ChunkRepory{success: 700, failed: 300}

map is aggregated later to sum of those maps,