silvagustin

silvagustin

Upload to S3 from Google Drive by downloading the file in chunks and uploading it to S3 at the same time

Hello everyone.

I’m working on an application that lists all your Google Drive files and allows you to upload them to our app (which uses S3).

Unfortunately, it’s not possible to use a file’s URL given by Drive to upload the file using waffle_ecto with opt allow_urls: true. The only way is to download the file first to a temporary folder and then uploading it to S3; which it worked, but is it possible to avoid using a temporary folder and upload the file at the same time that is being downloaded?

Well, I’ve tried to do it but I failed. The async_download/2 function was extracted from [{poeticoding}] (Download Large Files with HTTPoison Async Requests) by the author @alvises and it works perfect. I believe the problem is in the chunk param when I call ExAws.S3.put_object/3 inside the upload_chunk_to_s3/2 function. Currently, the file type of chunk is an iodata. I’ve also tried to transform it to a binary using IO.iodata_to_binary/2 and encoding to base64 with Base.encode64/2 but in both cases I failed.

Here is the code involved:

defmodule GoogleDriveApi do
  @moduledoc """
  Google Drive API module.
  """

  @base_url "https://www.googleapis.com/drive/v3"

  @doc """
  Downloads a single file from Google Drive API.
  """
  def download_file(access_token, file, filename) do
    url = @base_url <> "/files/" <> file["id"]

    headers = [
      Authorization: "Bearer #{access_token}",
      Accept: "Application/json; Charset=utf-8"
    ]

    options = [
      params: [
        alt: "media"
      ],
      stream_to: self(),
      async: :once
    ]

    with {:ok, resp} <- HTTPoison.get(url, headers, options),
         :ok <- async_download(resp, filename) do
      {:ok, ""}
    end
  end

  defp async_download(resp, filename) do
    resp_id = resp.id

    receive do
      %HTTPoison.AsyncStatus{code: 200, id: ^resp_id} ->
        HTTPoison.stream_next(resp)
        async_download(resp, filename)

      %HTTPoison.AsyncStatus{code: status_code, id: ^resp_id} ->
        IO.inspect(status_code)

      %HTTPoison.AsyncHeaders{headers: _headers, id: ^resp_id} ->
        HTTPoison.stream_next(resp)
        async_download(resp, filename)

      %HTTPoison.AsyncChunk{chunk: chunk, id: ^resp_id} ->
        upload_chunk_to_s3(filename, chunk)

        HTTPoison.stream_next(resp)

        async_download(resp, filename)

      %HTTPoison.AsyncEnd{id: ^resp_id} ->
        :ok
    end
  end

  defp upload_chunk_to_s3(filename, chunk) do
    IO.puts "UPLOAD CHUNK TO S3"

    path_to_s3 = "tmp/" <> filename
    
    get_s3_bucket()
    |> ExAws.S3.put_object(path_to_s3, chunk)
    |> ExAws.request!
  end  
end

One thing I forgot to mention was I saw some issues on Stackoverflow that could solve this by using Javascript and Buffers. Maybe it’s not possible from the Server side and I have to try it from the Client side?

Any advice would be appreciated!

Cheers,
Agustín Silva.

Most Liked

cjbottaro

cjbottaro

objects = ExAws.S3.list_objects_v2(bucket, prefix: prefix)
|> ExAws.stream!()
|> Stream.reject(fn %{key: key} ->
  String.split(key, "/")
  |> Enum.any?(& &1 == "report")
end)
|> Stream.map(fn %{key: key} ->
  stream = ExAws.S3.download_file(bucket, key, :memory)
  |> ExAws.stream!()

  %{key: key, stream: stream}
end)

# We could have kept piping, but thought this was more readable maybe.

Stream.map(objects, &(Zstream.entry(&1.key, &1.stream)))
|> Zstream.zip()
|> Aw.Stream.chunk_by_bytes({5, :MiB})
|> ExAws.S3.upload(bucket, Path.join(prefix, "foobar.zip"))
|> ExAws.request!()

Not exactly the same, but our goal was “taking many files on S3, downloading them, and zipping them into a single file that we upload to S3.” This is all done with a single Elixir stream.

Despite the output being many gigabytes (maybe close to a terabyte), the process only uses a couple of hundred megabytes of memory.

I’m not sure what the Google Drive API is like, but you should be able to Stream.resource that shit! Once it’s in stream form, everything else is gravy.

evadne

evadne

You need to use S3 multipart upload but pay attention to the maximum chunk count and minimum chunk size.

I solved the problem by generating a state token which is continuously exchanged for part signatures (incrementally, to allow client uploads).

Should be easier if you were doing it server-side.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

evadne

evadne

Hi

  1. My chunked copier is implemented directly on top of ibrowse

  2. Yeah. packmatic/url.ex at develop · evadne/packmatic · GitHub but keep in mind it does not expose a stream.

Where Next?

Popular in Questions Top

vertexbuffer
Hello, can anybody help here..? I have a list of players and I what to delete an element, but every for loop the list is reverting to ori...
New
lastday4you
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
qwerescape
Is there a way to get the call stack or stack trace at any point in the code? Not from exceptions, but an expression that returns how the...
New
jaysoifer
Is there a way to rollback a specific migration and only that one ("skipping" all the other ones)? Would mix ecto.rollback -v 2008090...
New
dokuzbir
I want to highlight html closing tags when i click a html tag. That works in .html files but doesnt work for html.eex templates. How can...
New
johnnyicon
Hi all, I've just started learning Elixir and Phoenix Framework, so please pardon my n00bness at this stage. I'm trying to use Postg...
New
Qqwy
Original source of discussion: This topic on the Pragmatic Programmers' Functional Web Development with Elixir, OTP, and Phoenix forum. ...
New
hariharasudhan94
lets say i have a sample like a = 20; b = 10; if (a &gt; b) do {:ok, "a"} end if (a &lt; b) do {:ok, b} end if (a == b) do {:ok, "eq...
New
ycv005
I have followed this StackOverflow post to install the specific version of Erlang. And When I am running mix ecto.setup then getting fol...
New
JDanielMartinez
Hi! May someone helps me, please! I have two apps into an umbrella project: the first one is Database, which manages queries, and the se...
New

Other popular topics Top

Harrisonl
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service. Currently when I de...
New
chrismccord
As promised, the first release candidate of Phoenix 1.3.0 is out! This release focuses on code generators with improved project structure...
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
stefanluptak
Hello everybody, usually, I use a 29" ultra-wide monitor for VSCode which can easily accomodate explorer (files panel) + file with code ...
New
hariharasudhan94
lets say i have a sample like a = 20; b = 10; if (a &gt; b) do {:ok, "a"} end if (a &lt; b) do {:ok, b} end if (a == b) do {:ok, "eq...
New
jason.o
In the code below, if the create action is not set to accept “extra_key” as an input, it errors out with a message shown above. Is there ...
New
nsuchy
Hi. I’ve noticed that Windows Powershell has it’s own IEX command and you cannot access Elixir’s IEX due to the conflict. This isn’t a cr...
New
AstonJ
We’ve put together this wiki for Phoenix LiveView - please feel free to add any info you feel is worth including. What is Phoenix LiveV...
New
Qqwy
Update: How to use the Blogs &amp; Podcasts section You can post links to your blog posts or podcasts either in one of the Official Blog...
3271 126226 1237
New
lanycrost
Hi everyone! I need implement if…else if…else condition from my elixir code, and anymore of this control flow structures not work proper...
New

We're in Beta

About us Mission Statement