I have a zip file, containing CSVs, on a remote S3 cellar.
Using Unzip
, I can get a stream of the CSV file that I would like to decode, like so :
aws_s3_config =
ExAws.Config.new(:s3,
access_key_id: ["xxx", :instance_role],
secret_access_key: ["xxx", :instance_role]
)
file = new(zip_name, bucket_name, aws_s3_config)
{:ok, unzip} = Unzip.new(file)
stream = Unzip.file_stream!(unzip, file_name)
as explained in the doc.
Now I would like to consume that stream by reading it with CSV
.
So I try stream |> CSV.decode |> Enum.take(1)
and get an error ** (FunctionClauseError) no function clause matching in CSV.Decoding.Preprocessing.Lines.starts_sequence?/5
If I write the content of my CSV on the disk and then read it, it works fine :
# write the file on disk
stream |> Stream.into(File.stream!("stops.txt")) |> Stream.run()
# then read and decode it
File.stream!("stops.txt") |> CSV.decode |> Enum.take(1)
I get the desired result, the first row of the CSV file : [ok: ["\uFEFFstop_id", "stop_name", "stop_lat", "stop_lon", "location_type"]]
The difference I see is that Unzip.file_stream!
and File.stream!("stops.txt")
do not stream the file the same way. Unzip seem to do it by chunks of 65k, while File.stream!
streams line by line.
How can I solve this, without writing the file to disk as an intermediary step?
Thanks!