kbredemeier

kbredemeier

Streaming tar files

Hello,

I am trying to download a tarball, extract and decompress a file from it and write it somewhere on the disk in one go. Basically wget -qO- http://my_server/archive.tar | tar -xf - data/some.tar.gz -O | tar -xzf -.

Since erl_tar does not look like it supports streams I decided using a port with tar to do the extraction but I am failing doing so.

Here is the source of my GenServer that opens the port and is supposed to extract the inner gzip compressed file:

defmodule UnTar do
  use GenServer
  require Logger

  defstruct port: nil, collector_fun: nil, collector_acc: nil

  def start_link(opts \\ []) do
    {server_opts, otp_opts} = Keyword.split(opts, [:into])
    GenServer.start_link(__MODULE__, server_opts, opts)
  end

  @impl true
  def init(opts) do
    into = Keyword.fetch!(opts, :into)

    tar = tar_exe()

    tar_args = ["-xf", "-", "data/some.tar.gz", "-O"]

    port_args = [
      {:args, tar_args},
      :use_stdio,
      :binary,
      :exit_status
    ]

    port = Port.open({:spawn_executable, tar}, port_args)
    {collector_acc, collector_fun} = Collectable.into(into)

    {:ok,
     %__MODULE__{
       port: port,
       collector_fun: collector_fun,
       collector_acc: collector_acc
     }}
  end

  def send_chunk(pid, chunk) do
    GenServer.call(pid, {:send_chunk, chunk})
  end

  @impl true
  def handle_call({:send_chunk, chunk}, _from, state) do
    Port.command(state.port, chunk)
    {:reply, :ok, state}
  end

  @impl true
  def handle_info(
        {_port, {:data, data}},
        %{collector_fun: fun, collector_acc: acc} = state
      ) do
    Logger.info("receiving chunk form port")
    new_acc = apply(fun, [acc, {:cont, data}])
    {:noreply, %{state | collector_acc: new_acc}}
  end

  def handle_info({_port, {:exit_status, 0}}, state) do
    Logger.info("exiting normal")
    {:stop, :normal, %{state | port: nil}}
  end

  def handle_info({_port, {:exit_status, status}}, state) do
    Logger.info("exiting with #{status}")
    {:stop, {:exit, status}, %{state | port: nil}}
  end

  defp tar_exe do
    System.find_executable("tar") || raise("Could not find `tar` executable.")
  end
end

This is how I use the server:

source_stream = File.stream!("path/to/source_archive.tar", [:read, :binary], 512)
target_stream = File.stream!("path/to/target", [:write, :binray])

{:ok, pid} = UnTar.start_link(into: target_stream)

source
|> Stream.map(fn chunk ->
  UnTar.send_chunk(pid, chunk)
end)
|> Stream.run()

At the end tar prints prints out:

/usr/bin/tar: data/some.tar.gz: Cannot write: Broken pipe                                                                                                                                                                          
/usr/bin/tar: Exiting with failure status due to previous errors  

The resulting file is corrupted and tar does not send any exit code to my server.
Any idea what I am doing wrong?

Edit:
Forgot to add the bytes_or_line arg to the source stream. I wonder if this might have something to do with tar not being able to terminate the end of the file. tar uses a block size of 512 bytes and if I don’t provide the block size tar is additionally printing /usr/bin/tar: A lone zero block at 51035

Where Next?

Popular in Questions Top

9mm
I am constructing a JSON object (map) and I need to conditionally set a field. I’m trying to write proper elixir-way code… and I’m at a l...
New
sen
Hi All, I set a environment variables in dev.exs , like below code. when i start server, how can i set the ${enable} value? thanks. d...
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
Emily
I have VueJS GUIs with the project generated using Webpack. I have Elixir modules that will need to be used by the VueJS GUIs. I forese...
New
fireproofsocks
Forgive me if this is obvious, but how does one delete a database record WITHOUT selecting it first? Ecto.Repo — Ecto v3.14.0 has exampl...
New
bsollish-terakeet
Credo is smart enough to check for (something like) this: assert length(the_list) == 0 with this response: Checking if an enum is empt...
New
jay1
Why is it that the mnesia database isn’t the most preferred database for use in Elixir/Phoenix?
New
shijith.k
I am trying to start a new phoenix project with elixir 1.9, but mix phx.new does not work. It says that ** (Mix) The task "phx.new" could...
New
romenigld
I am trying to run a deploy with docker and I successfully runned with this command: docker build -t romenigld/blog-prod . but when I t...
New
PeterCarter
There are pre-rolled solutions for other frameworks that do work. However, Phoenix does not seem to have these. Have people had good expe...
New

Other popular topics Top

TunkShif
This post is an instruction guide to help you setup your Neovim for Elixir development from scratch. It includes general information on h...
274 41539 114
New
JorisKok
I have a server on AWS, and was running a load test using artillery. When looking at the Phoenix dashboard I see the Ports going to 100% ...
New
lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
ovidiubadita
Hey all, I discovered Elixir and I love it. I always wanted to learn a functional programming and I intended to go for Haskell, but afte...
New
stefanluptak
Hello everybody, usually, I use a 29" ultra-wide monitor for VSCode which can easily accomodate explorer (files panel) + file with code ...
New
jay1
Why is it that the mnesia database isn’t the most preferred database for use in Elixir/Phoenix?
New
saif
Hello everyone, Long time lurker first time poster here. I’ve recently begun working on Elixir full-time again! :raised_hands: It’s been...
New
KronicDeth
Elixir plugin for JetBrain’s IntelliJ Platform (including Rubymine) This is a plugin that adds support for Elixir to JetBrains IntelliJ...
289 36128 110
New
Qqwy
Update: How to use the Blogs & Podcasts section You can post links to your blog posts or podcasts either in one of the Official Blog...
3271 126479 1222
New
dogweather
I wrote this comment on r/haskell, and it’s not popular there. :wink: But I think I’m on to something… Haskell reminds me of Java, and e...
New

We're in Beta

About us Mission Statement