I’ve been using the fantastic unzip v0.7.2 — Documentation package to help me work with zip files stored on Amazon S3. This has been great when dealing with files that are too large to download locally. Using Unzip.list_entries/1
, I’ve been able to enumerate all the files contained in each zip file and even stream single files line by line.
However I just recently encountered a snag: .gz
files.
Does anyone know of a way to similarly peek inside Gzip or Gunzip files that are stored remotely?
Some more info: the .gz
file contains a JSON object on each line (the file as a whole is not valid JSON, but each line is valid JSON).
I’ve tried something like this:
ExAws.S3.download_file("example-bucket", "path/to/file.txt", :memory)
|> ExAws.stream!()
|> StreamGzip.gunzip()
|> Enum.each(fn l -> Jason.decode!(l) |> IO.inspect() end)
And the JSON decoding fails but I’m not sure why – maybe because the “chunks” aren’t lines?
Any thoughts or guidance is appreciated!