First, I’ll tell you why I’m not using pre-signed URLs: because this is for sensitive files, that I want to control access to at a much more granular level than signed URLs allow.
I’ve been looking at aws-elixir and ex_aws_s3.
For now, I’m planning to use cloud functions, and want to keep memory limits as low as possible.
So, my objective is to stream a file from S3, to the client, while keeping memory usage constant and not running out.
You can use a pre-signed URL with a signature that has an insanely short period of validity. Sign the URL every time. Should be much simpler. Having traffic go through your host can create a massive headache when the time comes to pay your bills!
Another issue with the pre-signed URL is when using SSE-C (custom keys for server-side encryption) requires headers to be sent to AWS with the request, so you can’t just redirect them to the URL or give the link out (and also exposes the key to the downloader, although that’s not the end of the world).
I haven’t found a good way to initiate that in the browser without some janky web workers acting as a proxy or something, but my JS expertise is severely lacking these days. I’ve fallen so far behind on frontend stuff
I’ll pay $200 USD to anybody who can tell me how to get a working pre-signed GET URL with SSE-C that requires no headers. I’ve wasted so much of my spare time on this already
ExAws.S3.download_file/4: Defaults to a concurrency of 8, chunk size of 1MB, and a timeout of 1 minute.
So you should be ok. The stream isn’t started until the customer starts downloading anyway.
If you want to avoid the server-side round-trip, then you will have to expose the headers and start the download on the client with some JavaScript. You can use XMLHttpRequest to arrange the headers and get a blob back, then make a Data URI out of the blob, and serve it to the user… this works as long as the file is not large.
The memory usage wasn’t frightening, so might be okay, for now.
This is on a 6-core 2019 iMac with 32GB RAM. Not sure if there are any significant tradeoffs happening with memory usage if I go to a 1-2 vcpu environment…
Are the SSE-C keys a requirement from your customers or your own architecture? And how many concurrent downloads are you expecting?
A crazy idea, but it’s worth sharing… (only valid depending on your responses above)
You can spin up new S3 buckets on demand, copying files using your full security model and deleting those buckets after the download ends or after a regular interval.
This way, the clients can download files using a much more relaxed security model (but still secure) pre-signed URLs.
There is a default limit of 100 buckets per account unless additional quota is requested. Creation of a new S3 bucket takes a few seconds to a few minutes.
The concurrency of 8 for download_file is going to increase the memory size (and likely the amount of data references). If the goal is to stream, I would try to go with Finch (and/or Mint) + Range queries.
You can increase the maximum limit to 1000 by sending them an email (at least that’s stated in the documentation).
Depending on the implementation, you can work in the background and notify the user when everything’s read. I know it’s far from ideal, but it’s a possible solution given the constraints.
Cool! I made a small PoC using aws-elixir and Finch that kept the memory usage even lower (ends in 4m05). It’s not a fair comparison because it’s not streaming to the client.
I’m getting a 403, and can’t tell if I’m using the function arguments wrong or if I’m deriving the key incorrectly, so I don’t know where to waste my time