Upload external files from URLs (automated, no front-end)

I’m looking for a good way to fetch files from URLs and upload them to an S3 bucket. I’ve seen a lot of posts here about uploads, but they’re generally focused on images uploaded by users via a form. The scenario I’m working on is an automated script fetching images from a file server and putting them onto S3.

I have a list of [img_url1, img_url2, img_url3, | _the_rest], and I want to fetch and upload all of those files to my S3 bucket. What’s a good way to go about it?

I’m using Phoenix, but AFIK, that’s probably not very relevant to this task.

I’ll essentially be using your prefered http client library to fetch the files and using your prefered aws library to push the data to S3. From there you could layer optimizations like streaming or parallelism on top if needed.

1 Like

Any reason why you need to write your own code at all? I’ve used rclone for similar tasks, many times, with crushing success.

I have used Waffle for this.

https://hexdocs.pm/waffle_ecto/Waffle.Ecto.Schema.html#cast_attachments/4-examples

1 Like

Hey, Waffle author here.

You can do this via the Waffle library like this:

Avatar.store("http://example.com/file.png") #=> {:ok, "file.png"}

You only need to configure access for your S3 storage and create an uploader.

Please, feel free to ask if you’ll have any troubles.

2 Likes

This is my default plan, though which library is another question :joy:

I don’t think rclone or waffle can fetch the files from their URLs, so taking the boring approach with HTTP clients is the solution. That said, if anyone knows of a library that removes the need to temporarily store files on the web server or better yet one that completely automates uploading a file from its URL to S3, please chime in!

Oh this is interesting. I’ll take a look at your docs now.

s3cmd and s3 have a sync command that can synchronize between buckets and regions but I don’t know if you can use it to sync between s3 compatible providers. It seems like a logical goal but configuring multiple providers isn’t something I’ve done to know if it’s possible.

GitHub - s3tools/s3cmd: Official s3cmd repo -- Command line tool for managing Amazon S3 and CloudFront services or sync — AWS CLI 1.27.145 Command Reference lists the aws cli. I’m not at my laptop atm to look at s3cmd’s output but it’s one of the more robust CLIs that seemingly covers everything I throw at it.

My coworker is in the process of synchronizing our entire bucket between local and another provider due to issues with digital ocean spaces. I’ll try to hit him up later today to see how he’s getting that done. It wouldn’t be Elixir but knowing his steps could be helpful to work backwards if you go down the http client route.

1 Like