Library for parsing and building RFC-compliant* Content-Disposition headers

I’ve noticed that often when I find myself needing to manually set content-disposition - [MDN] the advice is usually to use URI.encode_www_form or sometimes URI.encode. This has usually resulted in users receiving files with mangled filenames, and sometimes if only using the filename= parameter, a filename that the browser may refuse to honor, or drop the extension and half the name.


I want to start off by saying that there is a library for creating content-disposition headers, and my work on the formatting side is mostly an extension of that library GitHub - jeroenvisser101/content_disposition: A helper package for Elixir to generate Content-Disposition values (shout-out to @jeroenvisser101)
I also want to give kudos to Julian Reschke aka greenbytes, whose RFC work was uplifting, and the jshttp/content-disposition npm package for their thorough tests.

Ok, returning to the topic…

So I’ve also noticed that Plug handles this with Plug.Parsers.MULTIPART for the newer, RFC-suggested/modern “filename*=” parameter, but still gives priority to the older, (US-)ASCII-only “filename=” parameter.

What got me here, looking for a RFC-compliant parser, is a scenario where we have to consume our downloads (why is not important), as well as with another issue where by following the old ways, users were ending up with mangled download filenames.

What I’m noticing is that each implementation of both the formatting and the parsing logic across multiple libraries, is either using the wrong RFCs (URI.encode_www_form), encoding the “filename=” parameter when it seems that RFCs say not to, and to use the “filename*=” instead (send_download, and a few other places and libraries), or other parses and encoding issues

For example:
If you have a file called "my 'secrets' file.txt" and encode it in different ways:

  • URI.encode gives "my%20'secrets'%20file.txt", which will give issues with “filename*=” since single-quote ' is the splitting character for parameter-extensions
  • URI.encode_www_form gives "my+%27secrets%27+file.txt", which will work but results in plus-signs in the file, which even the docs point out as a potential downside.
  • A couple places in different libraries where the wrong set of characters were used to do the encoding/percenting (the RFCs are quite dense at times so this seems reasonable)
  • Plug.Conn.Utils.params doesn’t allow spaces before the “quoted-string” value, but I believe they should
  • A number of places where values for filename*= are matched against “utf-8” only and not “UTF-8” or any language tag for example in filename*=UTF-8'en'some%20file.txt (which should be RFC-compliant)

What I am proposing though, is a well-tested library for creating and parsing the content-disposition headers that is used in Plug, and suggested in the docs of different libraries for when you need to extract the filename from a response using your HTTP client of choice

I am thinking something like jshttp/content-disposition but for Elixir, and potentially using greenbytes’ testing XML.
And it seems like it should be configurable for how “strict” it is, defaulting to not decoding the “filename=” parameter, but being a bit more forgiving when parsing with the “filename*=” parameter.

I have created a gist (link below) of some recent work I’ve done on the parsing. It still needs massive amounts of work, and honestly should probably be in a PR to @jeroenvisser101’s repository, but I wanted to start a topic here to get people’s insights, and hopefully raise awareness of how inconsistent it seems the implementations and suggestions I’ve seen are.

What do people think?

p.s. In the gist you’ll see MyApp.Downloadable. It is a protocol we use to make it so we never have to write another bespoke /download route again. We just have a controller responsible for each use-case necessary (authorized & single-use, shared & unlimited downloads within the TTL, etc.) and then some clever LiveView components that drop a Downloadable into an appropriate store with a UUID on-click, then finish opening the link using the new UUID. It’s pretty much solved all our download needs for LiveViews and “dead” views. And API users can still use the protocol, just on different controllers than e.g. the “SingleUseController”.

2 Likes

Yes!

I ran into the issue with filenames myself, and tore my hair out before figuring it out :slight_smile:

That sounds like bugs to be reported to those libraries to me.