I’m creating an HTTP wrapper to download files Stream. Currently, it works with :httpc, :hackney and :ibrowse and has the following functions:
stream/2 which creates an Elixir Stream.
read/2 which get the content and returns a string.
download/2 which downloads to a file.
I realized that I can use just stream/2 to implement the other two functions. This would simplify a lot the library code, but before making the change I just wanted to benchmark. I’m surprised with the results for a simple test with a 1MB file. If I use the stream/2 function the memory consumption is much higher (with the same speed):
read = fn -> {:ok, r} = Down.read(url, backend: :httpc); r end
read_stream = fn ->
url
|> Down.stream(backend: :httpc)
|> Enum.into([])
|> IO.iodata_to_binary()
end
The Down.read/2 function internally uses a very similar strategy to the read_stream function. Every time it receives a new chunk it appends to a list and when it finishes, it calls to the same IO.iodata_to_binary/1:
Stream uses function thunks, which will use more memory, plus there is more passing of data, so the GC will need to run more often, though it doesn’t run often as it so it can reclaim it all en masse later, so it is using available RAM to make the operation faster. Remember, only use Stream when you need true unbounded or unknown bound operations or when your overall structure exceeds available memory, else keep with immediate constructs.
And yeah, 33k is basically nothing for that kind of stuff.
Depends, can you use the data piecemeal or do you need all of the returned data en masse to work on it?
That all depends on the download API being used then, which one are you using?
I bet it’s storing to a binary and not being processed anywhere, binaries are extremely efficient especially cross-actors as they have their own global heap storage when above 64 bytes in size or so (as well as fast appending if only one owner, etc… etc… A stream will end up allocating a whole ton of binaries! But if your download API can write straight to a file then it can use the base BEAM calls, which use the kernel calls to pass the socket information straight over to the kernel so no real allocations need to be done. Need to see code to see what all is being done.
Well, I’m trying to do a generic library to stream HTTP request. So the main reason to use it is that you can use the data piece by piece. Another reason to use it could be to avoid an attack of huge files.
I’m not sure if I follow you here. The library can use the streaming options of :hackney, :ibrowse and :httpc. It checks the size of every chunk is received. When the sum of the chunk sizes is bigger than the given limit, it stops the download. It also checks the header size.
Exactly, you’re right. When the library receives a new chunk and it is written straight to the file, it only consumes 4KB. When I use a stream like this:
file = Temp.path!() |> File.open!([:write, :delayed])
url
|> Down.stream(backend: :httpc)
|> Stream.each(fn c -> IO.binwrite(file, c) end)
|> Stream.run()
File.close(file)