Mime type detection by magic numbers

Hey,

Does anyone know if there is an Elixir module, which detects the correct mime type for uploaded files?

I have some issus in my Phoenix project, for example Firefox sets the mime type for pdf files to
“application/download” instead of “application/pdf”.

So I’m looking for a module or help writing a module which can detect this from [file signatures]
(http://www.garykessler.net/library/file_sigs.html).

Any advice is welcome.
Thanks
Andy

3 Likes

If you are on a linux host you can probably use file (with --mime-type option) or mimetype.

If there is a package available on hex, I’d guess it does wrap one of these.

1 Like

that looks good. thanks!

1 Like

Is Dyanmo’s mime library what you’re looking for?

A simple Elixir library that maps MIME types to file extensions and vice versa.

As I understand @murphy he wants to check the magic number of the file, instead of relying on the file extension (which can be as wrong as anything a browser thinks to set as MIME-type).

Also, because of this thread, I hacked a wrapper which is available as file_info from hex.pm. It is very basic right now, but I do plan to make it better.

7 Likes

Ah right, sorry, I didn’t look into the library and wasn’t aware it was relying on file extensions.

Good on you for working on a wrapper :023:

1 Like

Something interesting would be to extract the header of the file and pattern-match it against a pre-defined map of type <-> headers.
If someone wants to play with bitstrings… :stuck_out_tongue:

3 Likes

As you can see in issue #4, I do plan in the long term, that I want to remove the necessy of find beeing available and to the tests directly on Elixir side, but I really think it will tike quite a while, since there are many MIME-types with many magic numbers.

At first I’d need some reliable source of such magic numbers and their mapping to MIME-types.

1 Like

Is this any good?
https://freedesktop.org/wiki/Software/shared-mime-info/

yes i was thinking, that maybe someone has already done this.
But i´m working on linux anyway, so many thanks to @NobbZ for the wrapper.

I will take a closer look later this month if this can help.

I have thought about this, but I don’t think that a sequential pattern match is fast enough. Since these magic numbers are of different length and you only now the length after you had a successfull match, I think some kind of Trie

3 Likes

@Nobbz, see “checktype_contents” in this file
http://cpansearch.perl.org/src/PMISON/File-Type-0.22/lib/File/Type.pm

1 Like

Quick note that @NobbZ has been hard at work on file_info over the last couple of weeks. Thanks, @NobbZ !

5 Likes

Thanks! @Nobbz :grinning:

1 Like

Hey why can I be getting this error when I am trying to use the package?

** (MimetypeParser.Exception) illegal characters "`"
        (mimetype_parser) lib/mimetype_parser.ex:14: MimetypeParser.parse!/2
        (file_info) lib/file_info/mime.ex:25: FileInfo.Mime.parse!/1
        (file_info) lib/file_info.ex:39: FileInfo.to_tuple/1
        (elixir) lib/stream.ex:565: anonymous fn/4 in Stream.map/2
        (elixir) lib/enum.ex:3317: Enumerable.List.reduce/3
        (elixir) lib/stream.ex:1568: Enumerable.Stream.do_each/4
        (elixir) lib/enum.ex:3015: Enum.reverse/1
        (elixir) lib/enum.ex:2647: Enum.to_list/1
        (elixir) lib/map.ex:181: Map.new_from_enum/1

Because there is some invalid character in the MIME-Type. Please check the result of calling file --mime-type on the file in question. Also please feel free to open a bugt report at https://gitlab.com/NobbZ/file_info/issues.

if i go into the folder and call it there it works:

image/jpeg

Is that the full output? There has to be something else confusing the system… And is that the same machine than the one you have the problem with or is it another machine you try to upload from?

the full output is

random-grid.pdf: image/jpeg

it is the same machine, but the project is running inside a docker container

Does the result change when run in the docker container? Injecting the same set of --env values as when running the app.