Puppeteer-PDF - Converting HTML to PDF

Hi, a couple of months ago I’ve tested some libraries to convert HTML into PDF, but in the end I’ve decided to create a wrapper for puppeteer-pdf that is a NodeJS binary. Puppeteer is the name of the software that allow to control a Chrome windowless instance, and with puppeteer-pdf it allow to render into a PDF an HTML file.

GitHub project in https://github.com/coletiv/puppeteer-pdf
Medium article where I compare to WkHtmlToPdf (the most popular library to generate PDF in Elixir), https://medium.com/coletiv-stories/puppeteer-vs-wkhtmltopdf-and-why-i-created-a-new-module-9466eb1db7d1

Any feedback is welcome.

6 Likes

Neat :slight_smile:

Some comments:

As noted in the Library guidelines using Application.get_env should be avoided in libraries as it is both cumbersome and easily results in unresolvable conflicts in needs. Consider moving it to the options parameters that is already there.

I suppose this next thing is an application issue, but constructing command line params as it currently is without checking for special/control characters in them is a security hole waiting to open.

It also looks like a small landmine that Briefly cleans up on process exit, but generate/3 does create a process; so calling this repeatedly from a long-lived process will (probably unexpectedly to the user) create a bunch of tempfiles that never get deleted and slowly fill up an ets table in the background. Might be nice to spawn a process for the work in generate.

The function names … they don’t feel immediately clear as to which does which just from the name and there are no docs (also: write docs and typespecs :slight_smile: … perhaps PuppeteerPDF.Generate.from_string/3 and PuppeteerPDF.Generate.from_file/3 would be more explicit and clear? And yes, that implied making a PuppeteerPDF.Generate module, but to my eyes it reads a lot clearer as to what is going on…

As for features, would be awesome if you could give it a URL and fetch the HTML locally … but that’s a significant feature add… that said, one can imagine the name of that function easily with the Generate.from_* pattern …

Hope you don’t mind so many comments / critiques eek it is great to see more modules and libraries in the community and this is a great start to something a lot of people need…

5 Likes

It’s actually fine with System.cmd because it does not call the shell. It’s not possible to do classical shell-injection with things like "foo; $(rm -rf /)" - the pupeteer executable will receive it as a whole string. There might be some issues in what it does, in turn, with them, so it’s better to escape anyway, but the direct problem is not actually there.

5 Likes

Thanks for your time spent on given me feedback. I only had time to reply now, but I will start doing some improvements that you mention.

I didn’t know about this guideline, I will check all the recommendations.

At the end of this sections it says,

The application environment should be reserved only for configurations that are truly global.

Shouldn’t that be ok for the use case of this library, that is the path to execute the NodeJS binary ?

Related to generate/3, I don’t see the where it can hold up on this library. Are you talking about in some case that the System.cmd/2 don’t terminate or the process hangs ?

Related to the possibility of command injection, I known that in theory that can be possible, but I didn’t have time yet to test it, I will also check that.

I will improve both docs and functions names, thanks!

Hey @speeddragon,

Is it possible to generate a PDF from a url?
Is there any options for enabling cookies when generating a PDF? (so that even generating a PDF from a page that requires authentification is possible).

Hi @ryanzidago, currently this isn’t supported. This library use https://www.npmjs.com/package/puppeteer-pdf to generate the PDF file, and I’ve looked also into pdf_generator library, it doesn’t supported either.

To support what you want, I would go with a custom javascript code using puppeteer example. You can add cookies there, can trigger the render of PDF there. Should be simple.

If you want to contribute, you can probably use the github source and create a PR for both javascript and elixir module, but I think that the project isn’t active (the nodejs one).

1 Like

Thank you for the quick reply!
Aslo with pdf_generator, one can generate PDF from URL, but I still have to find a way to pass cookies.