High memory consumption when uploading files with Phoenix

wmnnd · July 12, 2017, 10:55am

Hi there,

a little while ago, @josevalim pointed out in this thread that Phoenix is well-suited for handling large uploads since it doesn’t load the whole file into memory but instead writes it to disk instead.

However, I have noticed (using Elixir 1.4.5, Erlang/OTP 20 and Phoenix 1.3.0-rc.2) that uploads of large files (as multipart/form-data, i. e. using Plug.Parsers.MULTIPART) can cause a huge increase in memory consumption:

A minimal Phoenix application reports 35 MB memory usage in Erlang Observer. When uploading a file, this increases to over 100 MB per upload. This is not affected by upload speed and happens both with a throttled 1 MB/s connection and unthrottled uploads to localhost.
Once the connection has been terminated, the memory consumption goes down again to its original value.

Here is the Observer load chart for three subsequent 1 GB uploads to localhost:

So while there is no long-term memory leaking here, could it be that there is a temporary memory leak in Plug.Parsers that gets cleaned up when the connection is terminated?

If you want to quickly try this out, I have created a minimal Phoenix app with Observer and an upload form here:

wmnnd · July 12, 2017, 11:01am

And for added detail, here are the Load Charts for the first 60s of a throttled upload:

NobbZ · July 12, 2017, 12:09pm

So it reads some of the stream into memory and then flushes it to disk. Seems reasonable to me.

As you can see on your posted screens, only the binary heap is affected, exactly that piece of the BEAMs memory, that I’d expected the temporary data to be.

Not “fully loading a file into memory” leaves more than enough room to at least load it partially and write it in chunks to the FS.

wmnnd · July 12, 2017, 12:27pm

True you’d expect a zigzag pattern from the buffering and flushing of the file.
However, the default read_length is 1_000_000 (source), so I’d expect it to read up to 1MB from the socket, flush it to disk and then continue reading. This would then require additional memory of 1 MB or maybe a bit more. But as you can see from the graph, the baseline binary memory increases by about 30 MB right away and then never goes below that again until the request has terminated.

josevalim · July 12, 2017, 12:39pm

@wmnnd since you are benchmarking this aspect, could you also please try Plug master, since it has a hand-rolled file upload system with better configuration?

About the memory usage, the VM usually doesn’t bother performing garbage collection unless it really needs to. If you need to force it, consider calling :erlang.garbage_collect(self()) after Plug.Parser and see if it improves the result. Or try patching the file upload code in Plug and call it multiple times to see if it reduces the baseline.

outlog · July 12, 2017, 12:39pm

you have :length set to 1_000_000_000, try and change it to 1_000_000 :

  plug Plug.Parsers,
    parsers: [:multipart],
    pass: ["*/*"],
    length: 1_000_000_000

EDIT: length is max length, no wonder mem usage went down… (mistakingly thought there was another :limit option)

josevalim · July 12, 2017, 12:39pm

I believe :length is the maximum length. It is not how much will be loaded into memory at once.

NobbZ · July 12, 2017, 12:40pm

As I do read it, it will read at most 1MiB at once to create a chunk, a chunk is of size 8MiB. So under perfect curcumstances youll end up having 8 reads, 1MiB each for that chunk, resulting in the following allocations on the bin_heap:

first read: 1 MiB read + 0 MiB old chunk + 1 MiB new chunk = 2 MiB
second read: 1 MiB read + 1 MiB old chunk + 2 MiB new chunk = 4 MiB
third read: 1 MiB read + 2 MiB old chunk + 3 MiB new chunk = 6 MiB
fourth read: 1 MiB read + 3 MiB old chunk + 4 MiB new chunk = 8 MiB
fifth read: 1 MiB read + 4 MiB old chunk + 5 MiB new chunk = 10 MiB
sixth read: 1 MiB read + 5 MiB old chunk + 6 MiB new chunk = 12 MiB
seventh read: 1 MiB read + 6 MiB old chunk + 7 MiB new chunk = 14 MiB
eighth read: 1 MiB read + 7 MiB old chunk + 8 MiB new chunk = 16 MiB

Of course the same will repeat for the next 8MiB chunk.

And remember, the GC might decide to do not anything unless necessary, so you might see used, but not referenced memory in your statistics as well.

wmnnd · July 12, 2017, 12:40pm

@josevalim I’ve already tried adding forced garbage collection which does in fact bring the memory consumption down. I will give Plug master a try.

@outlog `:length is the setting for the maximum allowed file size to be uploaded, not for the individual chunks to be written to the disk.

aseigo · July 12, 2017, 12:58pm

Interesting if that’s how it works … Do you know why it holds on to the entire old chunk and allocates an entire new chunk? It would seem like the same sort of optimizations made with iolists or sub-binaries would be useful … and that pattern has got to be pretty bad for fragmentation? Hrm.

wmnnd · July 12, 2017, 12:59pm

@aseigo It seems like the function doing the reading is recursively calling itself so maybe that’s why it doesn’t let go of the chunks.

aseigo · July 12, 2017, 1:03pm

In which case one would hope it would be using an iolist to keep the data together …

NobbZ · July 12, 2017, 1:31pm

I haven’t looked at the implementation, I just made a worst case assumption here.

josevalim · July 12, 2017, 2:16pm

You can also try setting lower read_length and seeing how that impacts the overall memory. It implies more reads and writes overall though.

OvermindDL1 · July 12, 2017, 2:57pm

It doesn’t actually hold on to the old chunk, rather the old chunk remains in memory until it gets GC’d, which will happen sometime later (probably at process/connection end when it releases the global handle for the globally allocated binaries).

wmnnd · July 12, 2017, 3:04pm

TLDR: No matter which arguments you pass to Plug.Parsers, Plug.Conn will always read and write to disk chunks of 8 MB.

I’ve spent all afternoon trying different things but I think I have found one main cause of the problem: Ultimately, the :read_length setting has no effect because the naming of the opts getting passed around are getting confusing.
Plug.Conn.read_part_body/8 has a guard clause to check whether the chunk of data that has been read is larger than the length parameter. This length parameter gets extracted from the opts Keyword list passed to Plug.Conn.read_part_body/3. However, if Plug.Conn.read_part_body/3 gets called from Plug.Parsers.MULTIPART, :length parameter will always be unset because Plug.Parsers.MULTIPART removes it in order to use it for evaluating the total file size limit.

Here is where it gets even more fun: It doesn’t even matter that length is not passed on because when Plug.Conn.read_part_body/8 calls next/multipart/3, it doesn’t pass this length. Instead, it only passes on the opts Keyword list (without :length). It seems that adapter.read_req_body/2 doesn’t know the :read_length option and then defaults to reading 8_000_000 bytes anyways - which renders the guard check for the chunk size in Plug.Conn.read_part_body/8 useless.

aseigo · July 12, 2017, 6:18pm

Sure, but this wouldn’t be an issue if it used an iolist as it wouldn’t be conglomerating new data to old in memory. It would just be a list of 1MiB (or whatever) chunks.

OvermindDL1 · July 12, 2017, 6:19pm

Indeed.

I’m actually surprised it is not using some lower level primitive to stream the TCP file data to disk…

aseigo · July 12, 2017, 6:19pm

That is some nice code spelunking … looking forward to seeing what the improvement will end up being.

josevalim · July 12, 2017, 6:40pm

That was an excellent bug report, thanks for it!