Hello~
I am working on code that uploads video file(100MB ~ 300MB) to Amazon S3.
It takes quite some time. But I am not sure which elixir module or function I should use for concurrency.
I read some article about it but still don’t understand clearly.
And in the future, I need to code for uploading progress.
Which one do you recommend?
example
def upload_video_to_s3_async(path, username) do
s3_filename = create_s3_filename(path, username)
Task.async(fn ->
path
|> S3.Upload.stream_file()
|> S3.upload(System.get_env("S3-Bucket-Name"), s3_filename)
|> ExAws.request()
end)
end
Where does the video file come from? If someone has uploaded it to you, I would recommend using presigned S3 urls and having the user upload it directly to S3 instead of having your application as a middleman.
Yes, that’s correct. My app does this or image uploads and it looks something like this:
User initiates a file upload, client asks the API for a presigned URL
API responds with the URL and credentials the client needs to upload to S3 directly
The client completes the upload and keeps track of the response from AWS, including the key, prefix, and the URL to the new asset
When the save action is run (either by auto save or by a user clicking save), the client sends a request to the API with the info from AWS and the API stores enough info in the database to build a valid URL to that asset.
There are some other specific things that we do as well. For example, assets that are uploaded directly to S3 from the client go into an S3 bucket that has a content expiration in place. When the user hits the save action, the API initiates a copy process from the temporary bucket to a bucket where the asset is stored permanently. This way if someone uploads a bunch of imagse and doesn’t ever same the form they’re on, we don’t accumulate a bunch of junk assets.
In case you absolutely need to process the upload server side, you can use an unlinked supervised process, but you will have to make a copy of the uploaded file (because it’s a tmp file that will be deleted when the parent process is terminated, aka when the request returns).
# add a supervisor for your unlinked process in your application.ex
def start(_type, _args) do
children = [
YourApp.Repo,
YourAppWeb.Endpoint,
{Task.Supervisor, name: YourApp.TaskSupervisor}
]
opts = [strategy: :one_for_one, name: YourApp.Supervisor]
Supervisor.start_link(children, opts)
end
# then - in whatever module you are processing the file - add a function to handle the upload
def my_uploader_function(attrs) do
# make your own tmp file outside of the process
path = "#{attrs["file"].path}-copy"
:ok = File.cp(attrs["file"].path, path)
# spin up the process - this will also return the PID
Task.Supervisor.async_nolink(YourApp.TaskSupervisor, fn ->
file = %{attrs["file"] | path: path}
prepare_and_store_files(file)
end)
end
defp prepare_and_store_files(file) do
# do whatever you need to do with the file (some image manipulation or whatever) and upload
rescue
# maybe delete some database entries that depend on successful file upload etc.
after
File.rm(file.path)
end
You could retry the whole thing in the rescue block a few times - but in that case I would probably rather use something like exq
Note: According to the elixir docs the file should be removed automatically when your process exits, but when I tried it, I ended up with a bunch files in my folder. Maybe someone could tell me what I did wrong ^^