harmon25
PDFTron + Elixir
Hi all,
I have been building a PDFTron nif for a work project, and it is pretty awesome.
We are leveraging both the webview javascript library, and the C++ code integrated as a nif (nifcpp) for server side pdf + office doc processing. PDFTron exposes possibly the best(not java) headless docx/pptx/xlsx to pdf conversion I have come across. Only downside, it is not free; however they offer a very generous trial/demo to use for development.
Has anyone tried it before?
I posted some questions to slack, but figured I would also ask them here for more exposure.
- Operations on a PDF (creation, modify, conversion) are these dirty? If so, is it CPU or IO bound?
- Got some confirmation this is CPU bound on slack, which was also my hypothesis.
- If reading/writing the PDF file from the nif, would that change to being IO bound?
- Not sure exactly how one would determine if a nif is both IO and CPU bound, which scheduler option should be chosen; for example does CPU take precedent over IO, if they were equally dirty?
- Is a nif even the right tool for this? Would a port suffice?
- I have a nif working, but a bit concerned about safety…
- Would wrapping the C++ in some rust, and using rustler be of any benefit?
- We are passing strings from Elixir → C++ that represent the contents of a PDF file, and returning back contents as a string of a new pdf file.
- would using a resource binary be better for the input/output (more performant/safer)?
- was having some trouble getting this working, if this is the best way, might need some help.
- Having a bit of trouble dynamically linking the PDFTron compiled .so file in the context of a mix package (portable), tips?
- Need to do something like this
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"$(pwd)/c_src/Lib" - Could I just set something in the nif init function?
System.put_env("LD_LIBRARY_PATH", ...) - Where should I put this .so file? have it in
c_src/Libof the mix project, should it be inpriv/with my compiled .so?
- Need to do something like this
Thanks in advance for taking the time to reply!
Most Liked
ityonemo
- I would personally wrap this in an os thread, using enif_thread_create and friends.
1b. If you’re actually reading/writing the file inside the nif (which i don’t think you should do unless you need mmap or O_DIRECT, which you shouldn’t for pdfs), then it probably depends on the contents of the pdf. - I would probably pick a port instead of a nif, because pdf is inherently unsafe. You could also do something like set up a c_node.
2a. The inherently unsafe parts of pdf have nothing to do with the forms of safety that rust gives you, so wrapping in rustler will do nothing for you except making your calls at the boundary safe. (I developed Zigler, which lets you wrap in zig; similarly, there would be no safety benefits except making sure your nif interface is correct and marshalling into C types and doing ArgumentError if you mess up) - Resources are only useful if you are passing either a. mutable binaries or b. unserialized data between NIF functions. If your functions aren’t “in place” binary functions. then it’s much better to just marshal into erlang’s binaries.
- drop it into priv, and get the priv directory using :code.priv_dir, your life will be so much simpler. I believe Rustler and Zigler do this by default.
elcritch
For finding the .so file, the most common pattern is using :code.priv_dir/1.
See Reading files from your priv dir in elixir - Mintcore
Though with the updated build setup in Elixir 1.9+ it’s more stable to put it in the _build folder. I just copy the Makefile from GitHub - elixir-circuits/circuits_gpio: Use GPIOs from Elixir · GitHub .








