I was able to solve the problem of proxying + encrypting uploads to S3 via NodeJS in a memory efficient stream. This is easier in NodeJS as an express controller can receive the file contents as a stream and that stream can be transformed and piped up to S3 with predictable memory consumption regardless of file size. See busboy
I want the same behavior in Elixir - but this requires a custom multipart parser!
I have created a POC that instead of writing the uploaded byte chunks via
tmp/ the file is uploaded to S3.
I started from the builtin
Plug.Parsers.MULTIPART, and have modified to achieve the above. It is structurally very similar to the normal multipart parser.
If the file is <5 MB it is persisted with a simple
s3.put_object, when >5MB the file is persisted via an s3 multipart upload.
One change I made to more easily handle the multipart s3 upload, is altering the
read_length option from
1_000_000 bytes (~1MB) to
5_242_880 bytes (5MB). Are there potential negative side effects from doing this? (aside from obvious 5x memory consumption)
This was done because s3 multipart uploads must be 5MB chunks, so this allows chunks based on
read_length and no extra chunking logic…
Anyway, here are some questions I have for the community:
Could an Elixir Stream be used here similar to how it is done in NodeJS?
- I.E Plug.Upload actually returns a list of streams to files that are lazily parsed from the body
- I was not sure how to create a stream from the uploaded chunks and not introduce some memory leak as I cannot control the upload pace
- I need to buffer those bytes somewhere, right?
Is this a good use case for Broadway/GenStage?
- Thinking something like this (p: producer, c: consumer):
- Multipart Parser ( p ) → Hash?(transform1)(p/c) → Encrypt(transform2) (p/c) → S3Upload ( c )
Should more flexibility be introduced to the builtin parser to enable altering its behavior to allow this type of functionality.
Should this be its own hex package, used as an alternative to the built-in multipart parser.
Thanks for taking to time to read this, I am very open to suggestions and ideas!
If anyone would like to help make this into a published hex package, lets colab!