I needed to parse some data: URLs to get the binary data, and the built-in URI module didn’t do it. So here’s a small library for parsing them. If someone wants to add %DataUrl{} -> to_string(), I’d gladly accept it.
Cool! I did try searching hex & elsewhere for a library to parse data URLs before making this. I did manage to eventually find ex_url, but there were a few minor issues I saw with it so I decided to publish data_url anyway. Maybe I should just make a PR to ex_url.
the charset should default to US-ASCII if the mediatype is omitted
the charset is part of the media type, not a separate thing. If it were to be parsed, I’d prefer a separate MimeType struct.
URL.parse("data:") explodes, but I would prefer an :error
URL.parse("data:;base64,=") explodes, but I would prefer an :error
On reviewing RFC 6838 it seems like UTF-8 should now be considered the default charset parameter for a mediatype? Your thoughts?
From Section 4.2.1:
If a “charset” parameter is specified, it SHOULD be a required parameter, eliminating the options of specifying a default value. If there is a strong reason for the parameter to be optional despite this advice, each subtype MAY specify its own default value, or alternatively, it MAY specify that there is no default value. Finally, the “UTF-8” charset [RFC3629] SHOULD be selected as the default. See [RFC6657] for additional information on the use of “charset” parameters in conjunction with subtypes of text.
Regardless of what approach is chosen, all new text/* registrations MUST clearly specify how the charset is determined; relying on the US-ASCII default defined in Section 4.1.2 of [RFC2046] is no longer permitted. If explanatory text is needed, this SHOULD be placed in the additional information section of the registration.