S3 operations returning HTTP 505 error

fireproofsocks · December 22, 2022, 1:49pm

Given a particularly devious S3 object key, e.g.

"users/f8db13a0-e2b8-45cd-a902-5eede074eb46/~*\\|:\"<> +`!@#$%^&()-=_[]{};',.?🧪//fakedata~*\\|:\"<> +`!@#$%^&()-=_[]{};',.?🧪.txt"

I’m running into HTTP 505 errors from the ExAws client, e.g.

    bucket
    |> ExAws.S3.head_object(object_key)
    |> ExAws.request()

I can see the objects there if I run ExAws.S3.list_objects/2 or ExAws.S3.list_objects_v2/2

A python script uploaded the file (probably using official boto), but I’m failing to retrieve it.

Frustratingly, the AWS docs about 5xx errors do not mention 505 errors anywhere. There are few other mentions of this error (e.g. this one).

Because I’m able to interact with other “normally named” objects in the same bucket, the culprit is likely the weird object key. Has anyone else encountered this? Is this perhaps a bug in ExAws.S3?

benwilson512 · December 22, 2022, 4:39pm

This key contains values that are to be avoided according to the AWS docs Creating object key names - Amazon Simple Storage Service. I don’t expect ExAws.S3 is going to do a good job handling characters that AWS S3 itself says not to use.

al2o3cr · December 22, 2022, 5:41pm

The 505 error code is defined by RFC2616 as “HTTP Version Not Supported”.

My guess is that this error is being triggered by the unescaped space character in that key - the request would look something like

HEAD users/f8db13a0-e2b8-45cd-a902-5eede074eb46/~*\|:"<> +`!@#$%^&()-=_[]{};',.?🧪//fakedata~*\|:"<> +`!@#$%^&()-=_[]{};',.?🧪.txt HTTP/1.1

so the part of the key after the space is being interpreted as the requested HTTP version, and the S3 server is replying with a 505 because it looks for <method> <url> <version>.

fireproofsocks · December 22, 2022, 5:58pm

For sure, these are intentionally problematic keys, but they do represent possible values, and we can’t control what files the Python users might lob our way.

fireproofsocks · December 22, 2022, 6:16pm

This seems plausible – some of the other posts I’ve found describing similar errors seem to have something to do with spaces and encoding. I’m not sure how to escape the space in this case… adding a backslash or replacing it with + didn’t seem to work. Thoughts?

I’m definitely feeling like the better option is to just reject any file names like that in our app. Even if they were “legitimately” created elsewhere (e.g. using boto), our app can’t handle them in any practical sense.

al2o3cr · December 22, 2022, 6:39pm

IIRC the +-as-space escape sequence only works in query strings; try %20 instead

LostKobrakai · December 23, 2022, 11:16am

I fixed some encoding in ex_aws a few month ago around encoding spaces, which didn‘t work with minio. Maybe that‘s related: Use percent encoding instead of www form for header by LostKobrakai · Pull Request #184 · ex-aws/ex_aws_s3 · GitHub

evadne · December 25, 2022, 2:24pm

The proper fix/workaround is to store objects with GUID keys in S3 only (actual file name stored in database) and then use content-disposition to emit correct file names later.

As @LostKobrakai discussed, you can consider sanitising/encoding the characters or changing them to hex representation, however there is no standardised way to pass string literal within JSON & there may be significant work required to handle these files.

I think only place in ex_aws to change would be to do this encoding/decoding transparently (as breaking change) in ExAws.Operation.S3.add_bucket_to_path/2 or maybe ratchet up a step and use XML for outgoing requests (so you get proper string literals that can represent the object names you need), however as it would create such huge support burden for the maintainers + is already discouraged by AWS + would encourage further bad application design, I don’t think it will be considered AT ALL

So my best suggestion is to change design as previously mentioned to obviate the need of this rigmarole. It is also the right thing to do, if you could sanitise the names with some kind of URI encoded string and avoid a whole class of security bugs… such as user putting a file name that effectively results in a path traversal exploit. The bigger the attack surface the more bigger your risk.