Greetings fellow humans,
First, I’ll tell you why I’m not using pre-signed URLs: because this is for sensitive files, that I want to control access to at a much more granular level than signed URLs allow.
I’ve been looking at
For now, I’m planning to use cloud functions, and want to keep memory limits as low as possible.
So, my objective is to stream a file from S3, to the client, while keeping memory usage constant and not running out.
It looks like with
ex_aws_s3 I can do:
|> Enum.reduce_while(Plug.Conn.send_chunked(), &()...)
but I’m worried about not being able to control the memory usage it could be a problem?
Another option @philss mentioned in
aws-elixir in this issue is using a
range option, which I guess would give more control.
What do you fine elixir folks think I should do?
You can use a pre-signed URL with a signature that has an insanely short period of validity. Sign the URL every time. Should be much simpler. Having traffic go through your host can create a massive headache when the time comes to pay your bills!
Another issue with the pre-signed URL is when using SSE-C (custom keys for server-side encryption) requires headers to be sent to AWS with the request, so you can’t just redirect them to the URL or give the link out (and also exposes the key to the downloader, although that’s not the end of the world).
I haven’t found a good way to initiate that in the browser without some janky web workers acting as a proxy or something, but my JS expertise is severely lacking these days. I’ve fallen so far behind on frontend stuff
- Generate pre-signed URL but keep it server-side
- Serve a stream to the client
- When the client wants more data, you go and download a few megs from AWS and serve that down the stream
This way you can use whatever that reads chunks off HTTP and you can have more control over how big each chunk sent to the user should be, etc.
Also it should be possible to specify the key even in a pre-signed GET request for S3.
I’ll pay $200 USD to anybody who can tell me how to get a working pre-signed GET URL with SSE-C that requires no headers. I’ve wasted so much of my spare time on this already
ExAws.S3.download_file/4: Defaults to a concurrency of 8, chunk size of 1MB, and a timeout of 1 minute.
So you should be ok. The stream isn’t started until the customer starts downloading anyway.
Can try GitHub - eligrey/FileSaver.js: An HTML5 saveAs() FileSaver implementation for the saving part too
I’m probably going to start with a cap of 1GB filesize.
Related to FileSaver, I was looking at GitHub - jimmywarting/StreamSaver.js: StreamSaver writes stream to the filesystem directly asynchronous but thought that a service worker seemed a bit janky. Maybe it’s better than streaming through my own server, though.
The bottom line IMO is whether you MUST use SSE-C and if it is acceptable in your use case to expose the headers to the client.
Not an ideal solution, but use Nginx as a reverse proxy for S3 and extract header values from the URL params (or any other part of the query string).
zip from S3 through locally running app, through browser:
|> Enum.reduce_while(Plug.Conn.send_chunked/3, &(...))
The memory usage wasn’t frightening, so might be okay, for now.
This is on a 6-core 2019 iMac with 32GB RAM. Not sure if there are any significant tradeoffs happening with memory usage if I go to a 1-2 vcpu environment…
Are the SSE-C keys a requirement from your customers or your own architecture? And how many concurrent downloads are you expecting?
A crazy idea, but it’s worth sharing… (only valid depending on your responses above)
You can spin up new S3 buckets on demand, copying files using your full security model and deleting those buckets after the download ends or after a regular interval.
This way, the clients can download files using a much more relaxed security model (but still secure) pre-signed URLs.
There is a default limit of 100 buckets per account unless additional quota is requested. Creation of a new S3 bucket takes a few seconds to a few minutes.
The concurrency of 8 for download_file is going to increase the memory size (and likely the amount of data references). If the goal is to stream, I would try to go with Finch (and/or Mint) + Range queries.
You can increase the maximum limit to 1000 by sending them an email (at least that’s stated in the documentation).
Depending on the implementation, you can work in the background and notify the user when everything’s read. I know it’s far from ideal, but it’s a possible solution given the constraints.
Cool! I made a small PoC using
Finch that kept the memory usage even lower (ends in 4m05). It’s not a fair comparison because it’s not streaming to the client.
Here is the code: aws-s3-stream-download-poc/download_manager.ex at main · philss/aws-s3-stream-download-poc · GitHub (there is a custom
AWS.HTTPClient implementation using Finch inside the project).
This is excellent!
I’m definitely going to try this out soon, too.
Some months ago I wrote a stream downloader with Mint.
I had the opposite problem: stream from clients to S3. I needed to control the chunk size since S3 limits to 5MB.
That was accomplished with:
|> ExAws.S3.upload(s3_bucket, filename, opts)
Really cool, thanks for sharing this.
Is this how I use
AWS.S3 with SSE-C?
I’m getting a
403, and can’t tell if I’m using the function arguments wrong or if I’m deriving the key incorrectly, so I don’t know where to waste my time