Streaming from Ecto, through CSV, Zip, then up to S3... w/o going through a temp file

Hi there,

I’ve got a data export that has the following flow:

I’m using Ecto, CSV.encode/2, Zstream.zip/2 and ExAws.S3.Upload.stream_file/2.

I tried at first to avoid the temp file (I’m on Heroku, so not entirely ideal to use the filesystem, though passable) but I wasn’t able to get anything working. I tried writing my own function that used upload_chunk!/3 but it seems the first entry in the zip stream is not a binary.

Thanks for any guidance or advice you can spare!

2 Likes
2 Likes

I looked into Packmatic first, but its focus seems to be on downloads where the source is a URL or a local file. There’s the dynamic source, I didn’t get around to trying that, but according to the docs it’s

ultimately fulfilled by pulling content from either a File or an URL

You can probably implement a custom source, which is neither file nor url.

If you’re ok with gzip I’ve used https://hexdocs.pm/stream_gzip/readme.html before.

Seems like truely custom sources are only in the development branch: https://github.com/evadne/packmatic/commit/9439e7e2967aea65e872785334966810792fc675

If you make a GitHub repo and are willing to provide sample data (anonymised / mocked) and an access to S3[-compatible] storage then I’d gladly pair with you to find a solution.

2 Likes

Thank you, that’s very kind! I’ll see if I can spin something up in the next few days.

Yeah, gzip would be fine. What are you streaming into to stream up to S3?

Pretty sure you can pipe the gzip stream directly into: https://hexdocs.pm/ex_aws_s3/ExAws.S3.html#upload/4

The examples use https://hexdocs.pm/ex_aws_s3/ExAws.S3.Upload.html#stream_file/2 to create the stream but all that does is a regular File.stream! with the minimum chunk size set (5mb) https://github.com/ex-aws/ex_aws_s3/blob/v2.0.2/lib/ex_aws/s3/upload.ex#L53.

Do note that your chunks need to be at least 5mb which is the minimum multi part upload size for s3. If the compressed file is less than that you should just do a regular ExAws.S3.put_object call.

5 Likes