ChromicPDF

Hello!

Came here to announce ChromicPDF, a pet project PDF generator I’ve been working on for the past few months. Why another PDF generator, you may be asking? Because it was fun to implement a client for (small parts of) the Chrome DevTools Protocol in Elixir.

That’s basically Chromic’s main feature, it can talk to Chrome without the need for puppeteer / a NodeJS at runtime. It launches Chrome, spawns a number of targets that it keeps around, and to print a PDF instructs them to navigate to a URL. In sum, this makes for relatively performant PDF generation flow.

For good measure, I threw in a PDF/A converter using Ghostscript, capable of creating veraPDF-test-passing PDF/A-3b files - obviously inspired by a project where we needed to generate invoice PDFs for long-term storage.

Would be super interested in any opinion you might have! It’s not (yet) on hex.pm, will wait a bit to see if people are interested. Documentation can be found in the README and, oddly, in the Supervisor module.

cheers,
Malte

:tada:

Also kudos to my employer for letting us work on pet projects :+1:

10 Likes

Hi maltoe,

This looks like an interesting project, kudos on building it! :+1:

I’ve got some feedback after reading the project’s README.
When calling the ChromicPDF.print_to_pdfa function you’re providing a map in the print_to_pdf argument, however the specific map keys are in camel case while in Elixir I think it’s more standard/common to use snake case, so for example, you’d use:

print_to_pdf: %{
    # Margins are in given inches
    margin_top: 0.393701,
    margin_left: 0.787402,
    margin_right: 0.787402,
    margin_bottom: 1.1811,
   ...
}

instead of the more javascript-like one in the example:

print_to_pdf: %{
    # Margins are in given inches
    marginTop: 0.393701,
    marginLeft: 0.787402,
    marginRight: 0.787402,
    marginBottom: 1.1811,
    ...
}

This might just be nitpicking but I thought I should let you know! Congrats once again in building this :slight_smile:

Have not read the code nor do I know about the Chrome api, but maybe those options will be passed directly to the API and he just wanted to avoid manually converting from snake case to camel case.
If that is the case, I personally would leave it as it is as it follows the API and people who are familiar with the API would have an easier time working with it. However, if the goal is to completely abstract that, then having them in snake case would be better.

It depends on the viewpoint.

2 Likes

Great work. The fewer dependencies - the better.

Release it to hex!

I wish there was also a maintained library that writes PDFs without any dependencies, even Chrome. Straight to binary.

This is quite a tough one… You need something to either render the fonts into a path that you can embed into the PDF, or into something that you can embed as a font into the PDF.

There is for example PrawnPDF in Ruby https://github.com/prawnpdf/prawn and they do it without a browser or other libs. I always wondered why there is nothing like this in Elixir.

@dino @Phillipp is correct, these are passed directly to Chrome and I didn’t want to convert them, or have to maintain a list of them in the library, etc. Perhaps I should convert them to string keys in the README to make this more clear? What do you think?

@egze There is Gutenex which doesn’t have any dependencies and writes PDF. However, PDF is a pretty complex format and writing PDF directly also means you have to come up with a proper text processing API (i.e. something like LaTeX), and in the end teach your users/designers to use it.

IMO this is a call you have to make again with every project that has PDF rendering requirements. Rendering quality is likely to be better with a text processor (not saying Chrome doesn’t render “good enough” PDFs, just that with LaTeX you can do even better); but rendering HTML to PDF gives you the benefit that your designers can easily write templates in their usual environment and new developers can easily make changes without having to learn/adapt to a text processing language. Plus, browser support for print stylesheets and printing webpages in general has become much better in the last years; for example, Chrome automatically renders table headers again on new pages (if your table has proper <thead> tags).

1 Like

@maltoe I was wondering, how does it work with Chrome with many parallel calls to create a PDF? Is the same Chrome instance reused, or is one started every time? How many parallel calls can it handle?

I don’t know if converting the keys to strings would make this more clear, so I’d leave it as is :slight_smile:

@egze The same Chrome instance is reused. It has multiple tabs open that are part of a session pool (using poolboy for now), so it can handle multiple calls in parallel. However, there is still only a single connection between Elixir and the Chrome instance, so in theory that could become a bottleneck, although there isn’t much communication happening and I assume most time is spent inside the tab processes when navigating the page / generating the PDF. It’s essentially puppeteer in Elixir :-). Of course, very limited in terms of the functionality it supports from the Chrome API, but conceptionally it works the same.

How many parallel calls: I don’t know, I’ve set the number of workers in the session pool to 5, for no particular reason. Generally speaking, I haven’t done much benchmarking yet, only compared (eye-balled) it against printing a PDF from the command line with Chrome (i.e. spawning an instance every time) and ChromicPDF is definitely magnitudes faster.

2 Likes

@egze If you would want to maximize “PDF throughput”, you could use the {:html, <html blob>} way of passing input, which in turn calls the Page.setDocumentContent function to replace a tab’s contents.

I’ve also been thinking about connecting it to a LiveView to have it only mutate the DOM in places where it needs to (say you’re generating invoice PDFs and you only need to replace the addressee and order items) -> This would probably result in the lowest latency between call and PDF printed, though it’s a bit difficult to synchronize the LiveView-based page mutation and the Page.printToPDF call. Also, this is probably just me looking for intesting problems to solve, not a realistic use-case :slight_smile:

1 Like

Hey @maltoe,

Thanks for writing this library, it’s very convenient and simple to install/use! I’ve been playing with it a little bit and tried to print a pdf from a url and it works great!

I’d like to print a pdf from a url which requires that the user is authenticated. One way to do this would be through the cookies in the session. Do you know if I can send cookies to Chrome with ChromicPDF?

I think it should be possible through the Chrome DevTools Protocol.

Hey @ryanzidago,

thanks! Glad you like it :+1:

Setting a cookie is certain possibly with the devtools protocol, and could be implemented into ChromicPDF relatively easily somewhere here.

That being said, I’m trying to discourage users from printing PDFs from remote URLs (hence the default offline mode), for 2 reasons: First, printing from remote URLs seemed to be a bit unreliable in my tests, at least I got rather fluctuating response times (i.e. printing http://example.net a thousand times, sometimes for some reason Chrome would take a second or so to load the page), and second I think users should be well aware of the security implications when doing so.

However, as said, it’s definitely possible. Could you open an issue in the repo so we can discuss there?

Alright, new issue here: https://github.com/bitcrowd/chromic_pdf/issues/23