Blog Post: Can Phoenix Safely use the Zip Module?

Elixir has a built-in Zip library that comes with OTP. This post explores how to use the zip module and asks the important question: “Is this safe to use with user provided zips?” We explore two different types of zip-based attacks and see what we learn from it.

1 Like

Nice post! zip format is full of dark corners like this.

Another fun fact. Because of the way zip spec is, which is unlike other formats, zip file index is at the end. You can just append new index to delete/rename files! Without actually touching file data. Basically, you can just take the existing index, update it, and append at the end of the file, without actually modifying any of the file data. And according to spec, this is a valid file. And there are many tools that abuse this fact for various “features”. Also, that’s how most of the “recover deleted files” feature works.

IMO one way to reduce the issue surface is to avoid touching the file system altogether if you don’t need to. You can just keep them on memory if they are small, or you can stream them if you can’t fit them on memory, or extract only the file you need.

If you are automating something which requires reading from zip, then likely you already know the filenames, or structure, so you can just fetch the ones you need.

There are mainly two approach to stream files from a zip:

Since both approach use streams, and don’t write/extract anything to the file system, both are not susceptible to path traversal attack

  • read the zip from the beginning to the end. Example: zstream
    Since you can’t really reverse stream (or move to an arbitrary position), it is not susceptible to zip bomb. The code does not even need to be aware of zip bomb!

    Cons:

    • because this approach relay on local file header which is optional according to spec, you won’t be able to parse all the zip files (especially the ones which are written in a streaming way)
    • since the approach does not strictly follow the spec, you might see some of the ghost files, or duplicate files or, the files which does not show when you open zip in a GUI etc.
    • can not selectively extract a single file efficiently
  • use seek & read API - Example: unzip (full disclosure, I am the author)
    approach might be susceptible to zip-bomb, if the library is not explicitly handling it (In case of unzip I am handling it). Since this approach follows the spec, it should work with all types of zip files.

3 Likes