Exploring contents of gz file on s3

I’ve been using the fantastic unzip v0.7.2 — Documentation package to help me work with zip files stored on Amazon S3. This has been great when dealing with files that are too large to download locally. Using Unzip.list_entries/1, I’ve been able to enumerate all the files contained in each zip file and even stream single files line by line.

However I just recently encountered a snag: .gz files.

Does anyone know of a way to similarly peek inside Gzip or Gunzip files that are stored remotely?

Some more info: the .gz file contains a JSON object on each line (the file as a whole is not valid JSON, but each line is valid JSON).

I’ve tried something like this:

ExAws.S3.download_file("example-bucket", "path/to/file.txt", :memory)
|> ExAws.stream!()
|> StreamGzip.gunzip()
|> Enum.each(fn l -> Jason.decode!(l) |> IO.inspect() end)

And the JSON decoding fails but I’m not sure why – maybe because the “chunks” aren’t lines?
Any thoughts or guidance is appreciated!

Try a streaming variant of Jason:


JSON streaming comprises communications protocols to delimit JSON objects built upon lower-level stream-oriented protocols (such as TCP), that ensures individual JSON objects are recognized, when the server and clients use the same one (e.g. implicitly coded in). This is necessary as JSON is a non-concatenative protocol (the concatenation of two JSON objects does not produce a valid JSON object).

Source Wikipedia.