I have an oracle database that stores documents (docx, jpg, pdf, png etc) in binary format(BLOB). I need to read the BLOB and convert it to a file with the appropriate format. How can I read and inspect the blob in Elixir to know the file format and convert it to the appropriate file.
Does the database not store information about what kind of file it is? Arbitrary file type identification is quite tricky…
Typically you wouldn’t inspect the BLOB to determine its file type - that information tends to be stored alongside with the binary in the database. This is similar to
- an OS associating a certain file extension with a particular file format
- a web response giving a media type in the content-type header.
Determining binary content by inspecting it is usually considered a “forensic” technique.
What do you need the file type for?
Usually you can just write the blob to a temporary location, or stream it out to the client without delay or write to disk.
If though you are on linux and really need to know the MIME-type for whatever reasons, you can write the blob to disk and use
file_info package to get the MIME-type. Be aware that this uses the
file executable and shells out to your operating system. Returned info might be innacurate but as you only have restricted set of possible filetypes it might be accurate “enough”…
Unfortunately there’s no data available aside the blob.
Not even a filename? I’d slap who ever created that database…
I need to migrate the data to a new system- this system works with the actual file.
I wish I knew the database creator and owner of the app for uploading those documents.
Given these circumstances, I think using
file directly or through the afforementioned package should give you as much info as you need.
Thanks, I will give it a try.
Yep, you don’t have much hope beyond the
file tool which can identify quite a lot of file types for you. Try it.
If it really is only a limited set of things, it wouldn’t be hard to build a simple
cond test to detect the type it is from certain mandated markers within (a binary prefix on most of them), then you could fall back to a
file_type test for more fuzzy/unknown things if there is any such things left in the DB.