URI.decode fails on some URLs with embeded spaces and special chars

Please how can we ERL decode the following properly ?

s = “https://172.16.1.101/gateway/media/W/document/1635952303392_25%%20LUMPSUM%20APPLICATION%20FORM%20NEW.pdf

Expected is: https://172.16.1.101/gateway/media/W/document/1635952303392_25% LUMPSUM APPLICATION FORM NEW.pdf

Tested here: https://www.url-encode-decode.com/

but I get this:

> URI.decode s
** (ArgumentError) malformed URI "https://172.16.1.101/gateway/media/W/document/1635952303392_25%%20LUMPSUM%20APPLICATION%20FORM%20NEW.pdf"
    (elixir 1.12.3) lib/uri.ex:419: URI.decode/1
>

Please is there an alternative to URI.decode in this instance?

Thanks.

Where you have ... _25%%20 ... in your string I think you need to send your % (percentage) as a %25 otherwise it’s being treated as an escape character. i.e. it should be ... _25%25%20 ...

1 Like

Thanks, sadly the client outputs their flies in this way, was hoping the URL.decode would be able to handle these edge cases correctly.

The website I linked, somehow is able to cope with these bad formats.

But will try your suggestion

1 Like

Yeah I tried that site as well. If you try the encode instead of the decode it does add the %25 for the percentage symbol so I guess it’s hiding that part from you when it decodes.

1 Like

I don’t believe there’s a reliable way to do this right - the generating service should not be producing invalid URLs. How would it encode a filename like this has a literal followed by digits %45 what happens?

Based on your example, the client would encode this as

https://172.16.1.101/gateway/media/W/document/this%20has%20a%20literal%20followed%20by%20digits%20%45%20what%20happens

which a standard URL-decoder will read as

https://172.16.1.101/gateway/media/W/document/this has a literal followed by digits E what happens

The files are being uploaded to a folder. The filename actually is like “doc_ 25% pension fund payment.pdf”

I’m just trying to server them from that folder

This seems to be fixed in v1.13.0-rc.0

From the release notes:
[URI] Only percent decode if followed by hex digits (according to URL Standard)

Erlang/OTP 24 [erts-12.1.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Interactive Elixir (1.13.0-rc.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> s = "https://172.16.1.101/gateway/media/W/document/1635952303392_25%%20LUMPSUM%20APPLICATION%20FORM%20NEW.pdf"
"https://172.16.1.101/gateway/media/W/document/1635952303392_25%%20LUMPSUM%20APPLICATION%20FORM%20NEW.pdf"
iex(2)> URI.decode(s)                                                                                                 
"https://172.16.1.101/gateway/media/W/document/1635952303392_25% LUMPSUM APPLICATION FORM NEW.pdf"

:slight_smile:

5 Likes

Awesome :+1::sunglasses:
Moving over to 1.13 ASAP

1 Like