fchabouis

fchabouis

Stream CSV file from a remote zip on S3

I have a zip file, containing CSVs, on a remote S3 cellar.

Using Unzip, I can get a stream of the CSV file that I would like to decode, like so :

    aws_s3_config =
      ExAws.Config.new(:s3,
        access_key_id: ["xxx", :instance_role],
        secret_access_key: ["xxx", :instance_role]
      )

    file = new(zip_name, bucket_name, aws_s3_config)
    {:ok, unzip} = Unzip.new(file)
    stream = Unzip.file_stream!(unzip, file_name)

as explained in the doc.

Now I would like to consume that stream by reading it with CSV.
So I try stream |> CSV.decode |> Enum.take(1) and get an error ** (FunctionClauseError) no function clause matching in CSV.Decoding.Preprocessing.Lines.starts_sequence?/5

If I write the content of my CSV on the disk and then read it, it works fine :

# write the file on disk
stream |> Stream.into(File.stream!("stops.txt")) |> Stream.run()
# then read and decode it
File.stream!("stops.txt") |> CSV.decode |> Enum.take(1)

I get the desired result, the first row of the CSV file : [ok: ["\uFEFFstop_id", "stop_name", "stop_lat", "stop_lon", "location_type"]]

The difference I see is that Unzip.file_stream! and File.stream!("stops.txt") do not stream the file the same way. Unzip seem to do it by chunks of 65k, while File.stream! streams line by line.

How can I solve this, without writing the file to disk as an intermediary step?
Thanks!

Most Liked

ahamez

ahamez

Hello,

I don’t know if it will help, as I’m not using Unzip, but StreamGzip, in combination with NimbleCSV, for this purpose. But maybe it will give you some hint?

I have the following function that returns a stream for an object downloaded from S3:

  defp get_object_stream(object) do
    {:ok, io_pid} = StringIO.open(object)

    io_pid
    |> IO.binstream(4096)
    |> StreamGzip.gunzip()
    |> NimbleCSV.RFC4180.to_line_stream()
  end

In my case, the trick was to use to_line_stream.

I then can use this stream like this:

object
|> get_object_stream()
|> NimbleCSV.RFC4180.parse_stream()

As you can see, I’m not streaming directly from S3 as I first download the object in memory, but if you have something that’s already able to stream from S3, you would just have to replace the part that constructs the stream from the in-memory string with your stream from S3.

LostKobrakai

LostKobrakai

List.flatten(c) |> Enum.join("") would probably better replaced with IO.iodata_to_binary/1

akash-akya

akash-akya

Echoing what @ahamez has already mentioned, the issue seems to be that CSV.decode expects stream of lines. But Unzip.file_stream! returns stream of blobs. You can convert stream of blobs to stream of lines yourself, or you can use NimbleCSV as already mentioned.

Unzip.file_stream!(unzip, file_name)
|> NimbleCSV.RFC4180.to_line_stream()
|> NimbleCSV.RFC4180.parse_stream()

Where Next?

Popular in Questions Top

Kurisu
For example for a current url like http://localhost:4000/cosmetic/products?_utf8=✓&query=perfume&page=2, I would like to get: ...
New
shahryarjb
Hello, I get Persian date from my client and convert it to normal calendar like this: def jalali_string_to_miladi_english_number(persi...
New
senggen
Erlang/OTP 25 [erts-13.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] 15:22:35.803 [error] gen_event {lager_file_backend...
New
dokuzbir
I want to highlight html closing tags when i click a html tag. That works in .html files but doesnt work for html.eex templates. How can...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
Emily
I have VueJS GUIs with the project generated using Webpack. I have Elixir modules that will need to be used by the VueJS GUIs. I forese...
New
freewebwithme
Using vs code and installed ElixirLS: support and debugger. And I got an error popped up on start up says Failed to run ‘elixir’ comma...
New
nobody
Hi! In PHP: $_SERVER[‘SERVER_ADDR’] - in Elixir? Searched the docs for ip address and the web, no good results. Thanks!
New
Brian
What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....
New
vonH
In asking this question I am more interested about the expressiveness of the language itself and less concerned about the availability of...
New

Other popular topics Top

lastday4you
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
greenz1
I have a phoenix application from which a user can download multiple(5-6) files of size 1MB. I couldn’t find anything related to sending ...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
gausby
I asked this very same question on twitter and got some interesting feedback, but I thought it would be a good question to ask here as we...
1207 39297 209
New
AstonJ
We’ve put together this wiki for Phoenix LiveView - please feel free to add any info you feel is worth including. What is Phoenix LiveV...
New
klo
Got a question about when to concat vs. prepending items to list then reversing to achieve appending. So i know lists boil down to [1 | ...
New
hariharasudhan94
I would like to know what is the best IDE for elixir development?
New
openscript
Hello! Sorry for this astonishing simple question, but I’m really stuck. I try to set up the intellij-elixir plugin, but I don’t know ho...
New

We're in Beta

About us Mission Statement