For no obvious reasons, my request hangs when I try to upload hundreds of images/files to aws

When request hangs control gained back is not printed(when I try to upload files bigger than 20 mb) But I see below error in my console

[warn] ExAws: HTTP ERROR: :closed for URL: "https://logs.us-east-2.amazonaws.com/" ATTEMPT: 1

I use below code to upload files

res =  files_list
    |> Task.async_stream(&upload_file(&1, images_upload), max_concurrency: 4, timeout: 1_500_000)
    |> Stream.run()
IO.inspect "control gained back"
res

and

def upload_file({file_type, local_file_path, file_name}, %ImagesUpload{} = images_upload) do
    file = File.read!(local_file_path)
    destination_path = Path.join([images_upload.upload_id, @file_types[file_type], file_name])

    S3.put_object(component_bucket(), destination_path, file)
    |> ExAws.request!()

    Logger.info("uploaded image #{destination_path}")
    {:ok, destination_path}
end

and in postman I get msg

Could not get response
Error: socket hang up

2 Likes

@tanweerdev Connection could be closed for various reasons more or less related with client. However if you did not reached any edge-case I think that you hit some timeout. Did you tried changing hackney options? Did you tried retries configuration?

I guess you have just started using this library or some configuration you have is still default (ATTEMPT: 1 so most probably no retries configuration). For bigger files and especially for slow internet connection chance for timeout may be really big, so it’s important that proper download/upload strategy is used regardless of service/software you use as everything have limits.

Also is your internet stable? I don’t think it’s that, but it’s just a good example … Some time ago fiber network I’m using was overloaded (people stay at home - read: “facebook” - because of covid restrictions) and I had many connection problems that time.

2 Likes

I see, in some cases it retried upto 5 times. I have added provided 25 mins timeout option Task.async_stream but before even 25 mins, request is closed and control is not returned back

It could be the reason. But I and also my friend has tried on stable internet connection and same happens when files total size is more than 20 mb

The problem is I am always returning {:ok, some_response} so it should give back cotrol even if something didnt go expected

No. But Trying now

Now I have added below configs

config :ex_aws, :retries,
  max_attempts: 10

config :ex_aws, :hackney_opts,
  recv_timeout: 1_500_000,
  timeout: 1_500_000

and

config :cybord, MyApp.Endpoint,
  http: [
    port: 4000,
    protocol_options: [
      max_request_line_length: 99_999_999_999,
      idle_timeout: :infinity
    ]
  ]
...

Just to be on safe side

Hmm … previously I used only smaller files, so I don’t really know what may go wrong, but I found this site:
https://aws.amazon.com/premiumsupport/knowledge-center/s3-upload-large-files/

What do you think about playing a bit with multipart_chunksize? Maybe it could be good for you to set to something like 10mb or maybe even smaller … For sure it’s more like workaround, but could also optimize your project. Again I did not made such things before, but it should work as ex_aws provides a way to read configuration from Amazon CLI files.

Could it be related to socket or request processing? why I think it could be because as soon as I see below msg

Could not get response
Error: socket hang up

All the processing stops for sure. Even if there is no warning or something
Note: Last time POST request hang up after 6-7 mins although I have 25 mins timeout option almost everywhere

I’m of course not sure, but asked myself if there is a problem with 20mb file then how it would work in case we would send exactly same file in smaller chunks?

ok, I see that most probably I would not able to reproduce that … I have 100/100 Mbps fiber internet, so sending 20mb file would definitely not take so much time

I saw similar problems in other libraries (other languages) and there with exactly same error people found that the configuration was incorrect, but it does not looks like your use case …

We upload 50 mb zipped file for testing purpose. Potentially file can be as big as 3 GB according to requirements at least in production. We have to unzip the file, upload each individual image/document which isnt normally bigger than few mbs.

configurations for which part? (Note: I think I have shared plenty of configs above)

Nvm, I just looked at similar issues and people in other libraries were declaring for example port where it was not needed to.

Just for example:

This shows that’s really hard to guess what’s the problem. There may be many reasons for that.

Hmm … on the other side in this issue:


people are talking about changing request headers …

I don’t have an AWS-based project right now. Do you think you can try those suggestions? Maybe AWS does not supports some headers like those mentioned in issue …

Maybe easier would be to compare working request (like curl) with current one?

Thanks for the help. But looks connection is being closed before even response was sent back to client. I tried few combinations from below configs

config :cybord, CybordWeb.Endpoint,
  url: [host: "localhost"],
  http: [
    port: 4000,
    protocol_options: [
      # request_timeout: 60000,
      inactivity_timeout: 60000, # 60000 millisecods equals one min
      max_request_line_length: 99_999_999_999,
      idle_timeout: :infinity
    ]
  ],

if I have inactivity_timeout: :infinity, it will process all the files but will not send back response even after processing it. I found this article helpful but even this doesnt provide details about the protocol possible options. As soon as All tasks are processed via async_stream (I have also tried to use chunk of 4 and then do task await or Task.async_stream) I need to insert some record in db and send back response. So in my opinion solution lies in http protocol options but as documentation is not very clear. I cant figure out how to achieve this. Thanks in advance for all the effort and help @Eiji

Related article: Dealing with long-running HTTP Requests and Timeouts in Phoenix
and options: Nine Nines: cowboy_http(3)

ok, so most probably it’s not about content size except there is some limit on S3, but I don’t think it’s that as I believe that it should return a proper error response …

I have one more idea … Can you please compare result of:

S3.put_object(component_bucket(), destination_path, file)

with this documentation:

If it’s not that then I don’t think it’s possible to debug it properly at least from your app … Maybe we would need to debug some values in ex_aws code to have more information about request …