When I do large file uploading, which process function do you recommend?

freewebwithme · December 3, 2019, 10:27pm

Hello~
I am working on code that uploads video file(100MB ~ 300MB) to Amazon S3.
It takes quite some time. But I am not sure which elixir module or function I should use for concurrency.
I read some article about it but still don’t understand clearly.
And in the future, I need to code for uploading progress.
Which one do you recommend?

example

def upload_video_to_s3_async(path, username) do
    s3_filename = create_s3_filename(path, username)
    Task.async(fn ->
        path
        |> S3.Upload.stream_file()
        |> S3.upload(System.get_env("S3-Bucket-Name"), s3_filename)
        |> ExAws.request()
    end)
end

Task, Task.Supervisor, spawn?
I need your help!

benwilson512 · December 3, 2019, 10:28pm

Where does the video file come from? If someone has uploaded it to you, I would recommend using presigned S3 urls and having the user upload it directly to S3 instead of having your application as a middleman.

freewebwithme · December 3, 2019, 11:20pm

Hi! Thanks for your reply.
(I am actually fan of your book and building backend api using absinthe )
anyway.

video comes from users. Users record video and upload it to amazon S3

What does this mean? User uploads video to amazon S3 through my backend api.

hauleth · December 3, 2019, 11:55pm

And that is the problem. Instead make user upload that video directly to the S3 instead, without even touching your backend API.

outlog · December 4, 2019, 12:19am

use arc or the (afaik) more recently updated fork https://github.com/elixir-waffle/waffle - to upload to s3 through own server… uploading large files like video through/to own server is something I would generally avoid - especially when/if they are stored/persisted to s3… see this answer https://stackoverflow.com/questions/42211542/elixir-phoenix-client-side-browser-ajax-upload-to-s3 - for the “magic” url whereupon the user can upload directly to s3…

freewebwithme · December 4, 2019, 2:55am

Thanks for your link…
so in server side, only make presigned url then upload happens in client side using presigned url. Am I correct?

seanwash · December 26, 2019, 9:00pm

Yes, that’s correct. My app does this or image uploads and it looks something like this:

User initiates a file upload, client asks the API for a presigned URL
API responds with the URL and credentials the client needs to upload to S3 directly
The client completes the upload and keeps track of the response from AWS, including the key, prefix, and the URL to the new asset
When the save action is run (either by auto save or by a user clicking save), the client sends a request to the API with the info from AWS and the API stores enough info in the database to build a valid URL to that asset.

There are some other specific things that we do as well. For example, assets that are uploaded directly to S3 from the client go into an S3 bucket that has a content expiration in place. When the user hits the save action, the API initiates a copy process from the temporary bucket to a bucket where the asset is stored permanently. This way if someone uploads a bunch of imagse and doesn’t ever same the form they’re on, we don’t accumulate a bunch of junk assets.

Ninigi · December 27, 2019, 12:00am

In case you absolutely need to process the upload server side, you can use an unlinked supervised process, but you will have to make a copy of the uploaded file (because it’s a tmp file that will be deleted when the parent process is terminated, aka when the request returns).

# add a supervisor for your unlinked process in your application.ex
def start(_type, _args) do
  children = [
    YourApp.Repo,
    YourAppWeb.Endpoint,
    {Task.Supervisor, name: YourApp.TaskSupervisor}
  ]

  opts = [strategy: :one_for_one, name: YourApp.Supervisor]
  Supervisor.start_link(children, opts)
end

# then - in whatever module you are processing the file - add a function to handle the upload
def my_uploader_function(attrs) do
  # make your own tmp file outside of the process
  path = "#{attrs["file"].path}-copy"
  :ok = File.cp(attrs["file"].path, path)

  # spin up the process - this will also return the PID
  Task.Supervisor.async_nolink(YourApp.TaskSupervisor, fn ->
    file = %{attrs["file"] | path: path}
    prepare_and_store_files(file)
  end)
end

defp prepare_and_store_files(file) do
  # do whatever you need to do with the file (some image manipulation or whatever) and upload
rescue
  # maybe delete some database entries that depend on successful file upload etc.
after
  File.rm(file.path)
end

You could retry the whole thing in the rescue block a few times - but in that case I would probably rather use something like exq

Note: According to the elixir docs the file should be removed automatically when your process exits, but when I tried it, I ended up with a bunch files in my folder. Maybe someone could tell me what I did wrong ^^