a little while ago, @josevalim pointed out in this thread that Phoenix is well-suited for handling large uploads since it doesn’t load the whole file into memory but instead writes it to disk instead.
However, I have noticed (using Elixir 1.4.5, Erlang/OTP 20 and Phoenix 1.3.0-rc.2) that uploads of large files (as multipart/form-data, i. e. using Plug.Parsers.MULTIPART) can cause a huge increase in memory consumption:
A minimal Phoenix application reports 35 MB memory usage in Erlang Observer. When uploading a file, this increases to over 100 MB per upload. This is not affected by upload speed and happens both with a throttled 1 MB/s connection and unthrottled uploads to localhost.
Once the connection has been terminated, the memory consumption goes down again to its original value.
Here is the Observer load chart for three subsequent 1 GB uploads to localhost:
So while there is no long-term memory leaking here, could it be that there is a temporary memory leak in Plug.Parsers that gets cleaned up when the connection is terminated?
If you want to quickly try this out, I have created a minimal Phoenix app with Observer and an upload form here:
So it reads some of the stream into memory and then flushes it to disk. Seems reasonable to me.
As you can see on your posted screens, only the binary heap is affected, exactly that piece of the BEAMs memory, that I’d expected the temporary data to be.
Not “fully loading a file into memory” leaves more than enough room to at least load it partially and write it in chunks to the FS.
True you’d expect a zigzag pattern from the buffering and flushing of the file.
However, the default read_length is 1_000_000 (source), so I’d expect it to read up to 1MB from the socket, flush it to disk and then continue reading. This would then require additional memory of 1 MB or maybe a bit more. But as you can see from the graph, the baseline binary memory increases by about 30 MB right away and then never goes below that again until the request has terminated.
@wmnnd since you are benchmarking this aspect, could you also please try Plug master, since it has a hand-rolled file upload system with better configuration?
About the memory usage, the VM usually doesn’t bother performing garbage collection unless it really needs to. If you need to force it, consider calling :erlang.garbage_collect(self()) after Plug.Parser and see if it improves the result. Or try patching the file upload code in Plug and call it multiple times to see if it reduces the baseline.
As I do read it, it will read at most 1MiB at once to create a chunk, a chunk is of size 8MiB. So under perfect curcumstances youll end up having 8 reads, 1MiB each for that chunk, resulting in the following allocations on the bin_heap:
first read: 1 MiB read + 0 MiB old chunk + 1 MiB new chunk = 2 MiB
second read: 1 MiB read + 1 MiB old chunk + 2 MiB new chunk = 4 MiB
third read: 1 MiB read + 2 MiB old chunk + 3 MiB new chunk = 6 MiB
fourth read: 1 MiB read + 3 MiB old chunk + 4 MiB new chunk = 8 MiB
fifth read: 1 MiB read + 4 MiB old chunk + 5 MiB new chunk = 10 MiB
sixth read: 1 MiB read + 5 MiB old chunk + 6 MiB new chunk = 12 MiB
seventh read: 1 MiB read + 6 MiB old chunk + 7 MiB new chunk = 14 MiB
eighth read: 1 MiB read + 7 MiB old chunk + 8 MiB new chunk = 16 MiB
Of course the same will repeat for the next 8MiB chunk.
And remember, the GC might decide to do not anything unless necessary, so you might see used, but not referenced memory in your statistics as well.
Interesting if that’s how it works … Do you know why it holds on to the entire old chunk and allocates an entire new chunk? It would seem like the same sort of optimizations made with iolists or sub-binaries would be useful … and that pattern has got to be pretty bad for fragmentation? Hrm.
It doesn’t actually hold on to the old chunk, rather the old chunk remains in memory until it gets GC’d, which will happen sometime later (probably at process/connection end when it releases the global handle for the globally allocated binaries).
TLDR: No matter which arguments you pass to Plug.Parsers, Plug.Conn will always read and write to disk chunks of 8 MB.
I’ve spent all afternoon trying different things but I think I have found one main cause of the problem: Ultimately, the :read_length setting has no effect because the naming of the opts getting passed around are getting confusing. Plug.Conn.read_part_body/8 has a guard clause to check whether the chunk of data that has been read is larger than the length parameter. This length parameter gets extracted from the opts Keyword list passed to Plug.Conn.read_part_body/3. However, if Plug.Conn.read_part_body/3 gets called from Plug.Parsers.MULTIPART, :length parameter will always be unset because Plug.Parsers.MULTIPART removes it in order to use it for evaluating the total file size limit.
Here is where it gets even more fun: It doesn’t even matter that length is not passed on because when Plug.Conn.read_part_body/8 calls next/multipart/3, it doesn’t pass this length. Instead, it only passes on the opts Keyword list (without :length). It seems that adapter.read_req_body/2 doesn’t know the :read_length option and then defaults to reading 8_000_000 bytes anyways - which renders the guard check for the chunk size in Plug.Conn.read_part_body/8 useless.
Sure, but this wouldn’t be an issue if it used an iolist as it wouldn’t be conglomerating new data to old in memory. It would just be a list of 1MiB (or whatever) chunks.