File uploads - validate / virus-scan / upload / insert method

I’m tempted to use Ecto.Multi here, but I think this is not recommended as it keeps a DB transaction open for multiple API requests - is that correct?:

def insert_document(params) do
  changeset = Document.changeset(%Document{}, params)
  |> Multi.insert(:document, changeset)
  |>, fn _, %{document: document} -> 
    # hits external API
  |>, fn _, %{document: document} -> 
    # also hits external API
    S3.upload(document, params["file"])
  |>, fn _, %{document: document, bucket_upload: url} ->
     document |> change(%{url: url}) |> Repo.update()

Is it better to use with here?

def insert_document(params) do

    %{valid?: true} = changeset <- Document.changeset(%Document{}, params),
    {:ok, :no_virus} <- Virus.check_upload(params["file"]) ,
    {:ok, url} <- S3.upload(Changeset.apply_changes(changeset), params["file"]) do

      |> Changeset.put_change(:url, url)
      |> Repo.insert()

Any advice or general comments on that code would be appreciated :+1:

Uploading files to cloud services is imo best done by uploading the files to a temporary place outside of any transaction and then in the transaction queuing up a job for moving the file to a permanent place in unison with inserting the document.

I’d do the same with your virus check. Upload the file as is and in the transaction store the document with a flag, that virus check was not yet done. Queue up the virus check. When the virus check is successful toggle the flag and queue up moving the file to a permanent location.

This way even without the transactional guarantees in the execution of those jobs you at least have a history about what needed to happen and where it failed. This also means you can switch to direct to S3 uploads without great changes to the business logic. And especially important: A error in the document means the file on S3 is never moved and therefore eventually deleted. Just checking the changeset for errors is not enough to ensure the document will actually be inserted in the db. The db could have an outage at the moment.

1 Like

Thanks @LostKobrakai that makes a lot of sense. In the next week or so I’d like to put together a little phoenix project which achieves this flow. I’ll update when it’s done and see if you can validate my implementation?

When looking for the temporary upload, would a different S3 bucket work?

And as for the quick-fix solution, would you suggest the Multi or the with?

Doesn’t need to be. Livecycle config can be per prefix on S3, so it could also just be a different folder.

I’d use the with option and just hope the insert is always successful. I’m not sure it’s worth it trying to put virus checks and uploads within the transaction.

1 Like