Streaming gunzip does not seem to function as intended

I have a file

“data/discogs_20231001_artists.xml.gz”

file data/discogs_20231001_artists.xml.gz
data/discogs_20231001_artists.xml.gz: gzip compressed data, was "discogs_20231001_artists.xml", last modified: Fri Oct  6 08:30:03 2023, max compression, original size modulo 2^32 2250814227
  file_path
  |> File.stream!()
  |> StreamGzip.gunzip()
  |> Enum.into("")

Throws the following error:

** (ErlangError) Erlang error: :data_error
    :zlib.inflate_nif(#Reference<0.3060093016.1584267274.260437>, 8192, 16384, 0)
    (elixir 1.15.7) lib/stream.ex:1612: anonymous fn/5 in Stream.resource/3
    (elixir 1.15.7) lib/stream.ex:1052: Stream.do_transform_inner_enum/7
    (elixir 1.15.7) lib/enum.ex:1553: Enum.reduce_into_protocol/3
    (elixir 1.15.7) lib/enum.ex:1537: Enum.into_protocol/2
    (elixir 1.15.7) lib/enum.ex:984: Enum."-each/2-lists^foreach/1-0-"/2
    discogs.exs:109: (file)

What am I not considering?

Maybe StreamGzip does not recognize the file properly (and has a bug)? Have you tried with a smaller simpler file?

I have tried to unzip it with gunzip cli and it works. I was wondering if this is somehow Elixir related but looking at the error tells me otherwise.

The default behavior for File.stream! is :line, which handles newline normalization automatically (\r\n sequences become \n).

This will break most zip files instantly. :crying_cat_face:

You likely want the :binary mode instead, like in some of StreamGzip’s tests:

3 Likes