a little while ago, @josevalim pointed out in this thread that Phoenix is well-suited for handling large uploads since it doesn’t load the whole file into memory but instead writes it to disk instead.
However, I have noticed (using Elixir 1.4.5, Erlang/OTP 20 and Phoenix 1.3.0-rc.2) that uploads of large files (as
multipart/form-data, i. e. using
Plug.Parsers.MULTIPART) can cause a huge increase in memory consumption:
A minimal Phoenix application reports 35 MB memory usage in Erlang Observer. When uploading a file, this increases to over 100 MB per upload. This is not affected by upload speed and happens both with a throttled 1 MB/s connection and unthrottled uploads to localhost.
Once the connection has been terminated, the memory consumption goes down again to its original value.
Here is the Observer load chart for three subsequent 1 GB uploads to localhost:
So while there is no long-term memory leaking here, could it be that there is a temporary memory leak in
Plug.Parsers that gets cleaned up when the connection is terminated?
If you want to quickly try this out, I have created a minimal Phoenix app with Observer and an upload form here:
And for added detail, here are the Load Charts for the first 60s of a throttled upload:
So it reads some of the stream into memory and then flushes it to disk. Seems reasonable to me.
As you can see on your posted screens, only the binary heap is affected, exactly that piece of the BEAMs memory, that I’d expected the temporary data to be.
Not “fully loading a file into memory” leaves more than enough room to at least load it partially and write it in chunks to the FS.
True you’d expect a zigzag pattern from the buffering and flushing of the file.
However, the default
1_000_000 (source), so I’d expect it to read up to 1MB from the socket, flush it to disk and then continue reading. This would then require additional memory of 1 MB or maybe a bit more. But as you can see from the graph, the baseline binary memory increases by about 30 MB right away and then never goes below that again until the request has terminated.
@wmnnd since you are benchmarking this aspect, could you also please try Plug master, since it has a hand-rolled file upload system with better configuration?
About the memory usage, the VM usually doesn’t bother performing garbage collection unless it really needs to. If you need to force it, consider calling
:erlang.garbage_collect(self()) after Plug.Parser and see if it improves the result. Or try patching the file upload code in Plug and call it multiple times to see if it reduces the baseline.
you have :length set to 1_000_000_000, try and change it to 1_000_000 :
EDIT: length is max length, no wonder mem usage went down… (mistakingly thought there was another :limit option)
:length is the maximum length. It is not how much will be loaded into memory at once.
As I do read it, it will read at most 1MiB at once to create a chunk, a chunk is of size 8MiB. So under perfect curcumstances youll end up having 8 reads, 1MiB each for that chunk, resulting in the following allocations on the bin_heap:
- first read: 1 MiB read + 0 MiB old chunk + 1 MiB new chunk = 2 MiB
- second read: 1 MiB read + 1 MiB old chunk + 2 MiB new chunk = 4 MiB
- third read: 1 MiB read + 2 MiB old chunk + 3 MiB new chunk = 6 MiB
- fourth read: 1 MiB read + 3 MiB old chunk + 4 MiB new chunk = 8 MiB
- fifth read: 1 MiB read + 4 MiB old chunk + 5 MiB new chunk = 10 MiB
- sixth read: 1 MiB read + 5 MiB old chunk + 6 MiB new chunk = 12 MiB
- seventh read: 1 MiB read + 6 MiB old chunk + 7 MiB new chunk = 14 MiB
- eighth read: 1 MiB read + 7 MiB old chunk + 8 MiB new chunk = 16 MiB
Of course the same will repeat for the next 8MiB chunk.
And remember, the GC might decide to do not anything unless necessary, so you might see used, but not referenced memory in your statistics as well.
@josevalim I’ve already tried adding forced garbage collection which does in fact bring the memory consumption down. I will give Plug master a try.
@outlog `:length is the setting for the maximum allowed file size to be uploaded, not for the individual chunks to be written to the disk.
Interesting if that’s how it works … Do you know why it holds on to the entire old chunk and allocates an entire new chunk? It would seem like the same sort of optimizations made with iolists or sub-binaries would be useful … and that pattern has got to be pretty bad for fragmentation? Hrm.
@aseigo It seems like the function doing the reading is recursively calling itself so maybe that’s why it doesn’t let go of the chunks.
In which case one would hope it would be using an iolist to keep the data together …
I haven’t looked at the implementation, I just made a worst case assumption here.
You can also try setting lower
read_length and seeing how that impacts the overall memory. It implies more reads and writes overall though.
It doesn’t actually hold on to the old chunk, rather the old chunk remains in memory until it gets GC’d, which will happen sometime later (probably at process/connection end when it releases the global handle for the globally allocated binaries).
TLDR: No matter which arguments you pass to
Plug.Conn will always read and write to disk chunks of 8 MB.
I’ve spent all afternoon trying different things but I think I have found one main cause of the problem: Ultimately, the
:read_length setting has no effect because the naming of the opts getting passed around are getting confusing.
Plug.Conn.read_part_body/8 has a guard clause to check whether the chunk of data that has been read is larger than the length parameter. This length parameter gets extracted from the
opts Keyword list passed to
Plug.Conn.read_part_body/3. However, if
Plug.Conn.read_part_body/3 gets called from
:length parameter will always be unset because
Plug.Parsers.MULTIPART removes it in order to use it for evaluating the total file size limit.
Here is where it gets even more fun: It doesn’t even matter that length is not passed on because when
next/multipart/3, it doesn’t pass this
length. Instead, it only passes on the
opts Keyword list (without
:length). It seems that
adapter.read_req_body/2 doesn’t know the
:read_length option and then defaults to reading 8_000_000 bytes anyways - which renders the guard check for the chunk size in
Sure, but this wouldn’t be an issue if it used an iolist as it wouldn’t be conglomerating new data to old in memory. It would just be a list of 1MiB (or whatever) chunks.
I’m actually surprised it is not using some lower level primitive to stream the TCP file data to disk…
That is some nice code spelunking … looking forward to seeing what the improvement will end up being.
That was an excellent bug report, thanks for it!