iangreenleaf
Handling invalid UTF-8 strings from URL path params
Here’s the problem I’m working on: bots probing my site for vulnerabilities will try injecting special character sequences into params, like /articles/abc%DE~%C7%1FY, which becomes the binary <<97, 98, 99, 222, 126, 199, 31, 89>>, which is not a valid string.
No doubt this attack targets Oracle Server 2003 or something, I don’t know. It’s not going to cause any harm to my app, but it does end up triggering a Postgrex error because the invalid binary makes it all the way into the SELECT query before being rejected as invalid UTF-8.
I’d like to catch this earlier and return an appropriate 4xx error for invalid input rather than a 500 error when the DB query fails. Plug.Parsers has an option to validate UTF-8 in body and query params, so that a request like /articles/abc?a=b%DE~%C7%1FY will throw a relevant exception, but it seems like the path params aren’t checked in the same way.
I’m not sure how to attack this problem. I don’t want to add a check individually to every controller, since this is an application-wide need. Should the path params be run through the same parser checks as other params, or is there a reason they aren’t?
Most Liked
malaire
These new “Living Standards” seem to be quite new. But W3C does say that for HTML the WHATWG standard is current standard:
HTML Standard is the current HTML standard. It obsoletes all other previously-published HTML specifications.
And that WHATWG HTML Standard refers to this URL Standard.
In Goals section the URL standard also says that one of the goals is to obsolete RFC 3986 and RFC 3987.
NobbZ
You could add a plug which checks the :request_path. Something like this:
plug fn (conn, _opts) ->
if String.valid?(conn.request_path) do
conn
else
conn
|> Plug.Conn.put_status(:im_a_teapot)
|> Plug.Conn.halt()
end
end
This is a quick draft based on the docs. You might want to adjust some parts of it, add some content, make it a module based plug or change the status code sent 
Also this code assumes, that the :request_path is already decoded at this point. If it is not you can use URI.decode/1 to do so.
malaire
Current URL standard seems to be based on valid UTF-8 encoding.







