Hello
Why Tesla, HTTpoison so slow ?
Ruby “open” download files faster 10-20 times
Or maybe I do smth wrong
How to dowload files by URL, about 100-500Mb ?
Thanks!
Hello
Why Tesla, HTTpoison so slow ?
Ruby “open” download files faster 10-20 times
Or maybe I do smth wrong
How to dowload files by URL, about 100-500Mb ?
Thanks!
There is a big difference between Ruby and Elixir… With Elixir, it is possible to spawn multiple proccesses, each downloading one file.
It would be nice to see some code of your HTTPoison usage, for me it is working fine with
case HTTPoison.get(link) do
{:ok, %HTTPoison.Response{status_code: 200, body: body}} ->
File.write!(filename, body)
{:reply, :ok, state}
...
end
But what I like the most is running multiple download with Task.async/await
tasks = Enum.map(list, fn({link, filename} = _tupple) ->
Task.async(fn -> :poolboy.transaction(:worker,
&(GenServer.call(&1, {:download, link, filename}, @genserver_call_timeout)), @task_async_timeout)
end)
end)
result = Enum.map(tasks, fn(task) -> Task.await(task, @task_async_timeout) end)
The code is incomplete…
I translated an old scraper made in Ruby, to Elixir. It is 30x faster for my use case
I use this code
body = HTTPoison.get!(link, ["User-Agent": "Elixir"], [recv_timeout: 300_000]).body
File.write!(file_path, body)
If the file is big, it might not fit into memory, so it’s better to use stream_to
option with httpoison, and append to the file using IO.binwrite
.
So it’ll be something like
def download!(file_url, filename) do
file = if File.exists?(filename) do
File.open!(filename, [:append])
else
File.touch!(filename)
File.open!(filename, [:append])
end
%HTTPoison.AsyncResponse{id: ref} = HTTPoison.get!(file_url, %{}, stream_to: self())
append_loop(ref, file)
end
defp append_loop(ref, file) do
receive do
%HTTPoison.AsyncChunk{chunk: chunk, id: ^ref} ->
IO.binwrite(file, chunk)
append_loop(ref, file)
%HTTPoison.AsyncEnd{id: ^ref} ->
File.close(file)
# need something to handle errors like request timeout and such
# otherwise it will loop forever
# don't know what httpoison returns in case of an error ...
# you can inspect `_other` below to find out
# and match on the error to exit the loop early
_other ->
append_loop(ref, file)
end
end
Note that receive
won’t work in a genserver callback.
Yeah, not using that option for large files, will result in a lot of appending to previous chunks. Every now and then the whole previous chunk needs to get copied because no “appending” space for binaries is left. This will also cause a lot of pressure on the bin-heap garbage collection and cause a peak memory consumption that is at least twice the source file… Not to even say all the slow-downs du to the GC runs.
:stream_to
though will cause “instant” handling of received chunks, this way, they have to GC’d every now and then, but there is not that much of appending and copying going and.
Thanks a lot, now Elixir is faster ever ))
How can I download and immediately read the file ?
If your code is okay with receiving data in chunks, then put it in append_loop
(you would probably then name it receive_loop
or something like that). This way you won’t even have to write the contents to the filesystem.
If not, you can File.read
or File.stream
the file (by passing the path to it) after append_loop
returns.
I’m a bit late to this thread, but I’ve created Downstream, a package for streaming downloads with HTTPoison.