A part of my ETL/data-pipelining work involves dealing with input files whose size can reach 1 to 2 GB in size (compressed).
When working with Ruby, I usually shell out to tools like axel to handle the download in a way that can be resumed, should a disconnection occur, or even accelerated, if the source server is doing some throttling in a way or another.
To your knowledge, is there an already built solution to achieve that in Elixir or Erlang?
Solutions I’m considering:
- Shelling out to Axel using Porcelain (but porcelain has issues apparently)
- Implementing a resumable download manager myself on top of an existing Elixir HTTP library
- Wrapping a Rust download manager such as zou with e.g. rustler
If anyone dived into this topic, your feedback is most welcome!
For resumable downloads, I’d think it could be done with most http clients with a bit of work. For example, with httpoison, you’d write the stream of bytes (
stream_to: self() option) to a temporary file and only “save” it (move the temporary file to a “permanent” location similarly to how
Plug.Upload works) once the download is complete. If there is already a temporary file when a download process starts, it would read it, calculate the “offset” needed for the range header, and only download what it needs to.
For accelerated downloads, you’d need to split the file into chunks of some configurable size and download each with the “resumable” downloader described above, then
cat them into a single file at the final destination …
This is what I’d try before reaching out to rust or porcelain.
You can just use a Task to run a cli app(using System.cmd\3) to do the same thing.