It is pretty good for simple case, prints in 10ms (binary returned inline as encoded. With output to file it runs a bit slower when the document is small.
Yeah, unfortunately that is about the amount of error information ghostscript usually gives us It’s a blast. Not sure what is wrong exactly, could be incompatibilities with the version of Ghostscript you’re using, or some setup thing. Unfortunately that part of ChromicPDF is rather fragile, though we are using the feature ourselves - currently at Ghostscript 9.55 in an Alpine 3.15-based container. Will take a look into Ghostscript 9.56 when I find the time, created a ticket for it.
Regarding your speed & size concerns: As you said, these are out of ChromicPDF’s influence unfortunately. Still good to know, of course. If people are looking for minimum PDF file size, rendering with Chrome is likely not the way to go.
Size: We use it exclusively for 1-2 pager text documents, which usually clock in at around 30kb. I suspect that the size of the document is dominated by included images & fonts, and image quality.
Speed: As @evadne said, would be nice to see your benchmarks Usually it’s blazing fast for us, and magnitudes faster than anything that starts fresh Chrome instances for each PDF. Of course it is possible that wkhtmltopdf is still faster though, perhaps due to the faster rendering engine. But I have doubts and the “2 to 2.5 times slower” you quote seem a lot.
Not really a test code. I print_to_pdf/2 HTML strings of real-life documents and output: the PDF to file. I am not entitled to share those docs but while they are not trivial like the “Hello, world!” type of fixture you refer to, they’re not overly complex either. Less than three pages “Letter”, single font face, an image or two etc. As mentioned, this type is worth about 40 to 50 KiB for wkhtmltopdf, which renders and saves them consistently a tad under one second. With Chrome it is much more unpredictable (higher deviation) but not less than two seconds so far.
Thank you for coming back on it. I’ll see if I can get an earlier Ghostscript version on the dev machine. As for the speed, I am wondering what might be the reason if you say ChromicPDF might even be faster than wkhtmltopdf. Do I understand correctly that /unless/ I set on_demand: true option, the default setup is pooled with some default pool sizes mentioned in the docs, right? Maybe there lies something because I haven’t noticed any significant difference between the two setups. But I didn’t specify any pool options. And yes, zombies invaded my machine in this setup so I took it worked
Do I understand correctly that /unless/ I set on_demand: true option, the default setup is pooled
Yes, on_demand essentially bypasses the entire supervision tree booting, and instead starts the relevant ChromicPDF processes as well as the external Chrome process when you call print_to_pdf. So, if you’re testing this for example in a .exs script and only print a single PDF, these two modes of operation will in fact appear to behave the same. You should notice a drastic difference when you perform manual tests on the console and print multiple PDFs.
In order to debug this further though, it would be great if you could provide a minimum working example, i.e. some benchmark script with a PDF template that shows the slowness/unpredictability you’re experiencing.
Do you think your project may evolve and become more versatile, because I still can’t find a elixir library easy to use to just extract the words in a pdf.
Roger. I’ll check it all step by step in a day or two and if nothing helps I’ll make a “dummy” HTML document of the type in question and provide it for checking. Tnx so far once more.
OK, so I was able to get back to this and scrutinise my setup details. I thought I found the culprit when I managed to go down to below 300ms on average. The reason for the previous lack of performance (I thought) was that ChromicPDF instead of the current “Chromium” browser, picked a two years old “Chrome” I even forgot I had still installed. And since I blocked Google’s malware updater software from running, it wasn’t updated for over two years. So everything looked great… for a moment. Once things started to work well on the dev machine running “macos”, I moved to the one running GNU/Linux. In the end this is what production env runs. Here I also made sure that the very same, latest “Chromium” version[*] is run in place of previously installed packaged one, and still received times over two seconds (pooled). Yes, the Linux running machine is of similar hardware capabilities so that’s definitely not the up to 10x factor I observe. And yes that machine clocks similar times with wkhtmltopdf (around 800ms) as the “macos” computer I normally use for dev work.
Shall return to this and do some more “tracing”.
* - in both cases I downloaded the same version directly off the “Chromium” project rather than using packaged versions
if you are benchmarking/looking for speed, give weasyprint a go… I simply use it calling the CLI using rambo (quick copy/paste code below), but you can set it up with ports cmdarek.com - Generate PDFs in Elixir and have the weasyprint instance running at any time, including caching fonts/images for fast response…
might be quite the rabbit hole though:/
code for calling the cli (which obviously incur startup penalty) - (populate_html simply calls EEx.eval_file with the template path and data)
defmodule RamboPdf do
def create_pdf(item) do
css = Path.join([PDFfile.path(), "templates/invoice", "invoice.css"])
html =
PDF.gen_data(item)
|> PDF.populate_html("templates/invoice", "invoice.html")
safe_name =
PDFfile.fix_name(item.name)
|> String.trim()
|> String.replace(" ", "_")
|> String.to_charlist()
|> Enum.filter(&(&1 in 0..127))
|> List.to_string()
output_file = Path.join([PDFfile.path(), "output", "#{item.room}_#{safe_name}.pdf"])
# https://doc.courtbouillon.org/weasyprint/stable/api_reference.html#command-line-api
task =
Task.async(fn ->
Rambo.run("weasyprint", ["--encoding", "utf8", "-q", "-s", css, "-", "-"],
in: html,
log: false,
timeout: 20_000
)
end)
rambo = Task.await(task, :infinity)
case rambo do
{:ok, %Rambo{err: _err, out: output, status: 0}} ->
File.write(output_file, output, [:binary])
output
error ->
IO.inspect("error")
IO.inspect(error)
end
end
end
Thank your for the suggestion. The main reason for going with ChromicPDF was that I need to have the output from browser-printed and server-generated PDF look the same. And that without synchronising and sync-maintaining two different templates. Of course there may be some minor differences between how each of the three major browsers render the printout but tests have shown that those are negligible. IOW I am not really benchmarking for speed but rather trying to understand the reasons for severe underperformance I experience /in some cases/.
Coming back after tuning the eventually put into production mode project with ChromicPDF. The good news is that in production, the application containerised with recent Chromium and Debian packaged 9.53.3 Ghostscript works (pooled rather than “on demant”) eventually faster than wkhtmltopdf while producing similarly sized output files (about 50 KiB)! The [very] long story short - both the performance and especially the output size depends heavily on what fonts are available for Chromium to pick for given CSS. Using only non-commercial fonts (my GNU/Linux running laptop) makes a big difference when compared to a macos running desktop with lots of various fonts available for the browsers. In order to get the look and output size I am satisfied with I eventually removed all preinstalled fonts from the container and added only those non-commercial ones I found giving good visual results, and then picking out which give the smallest filesize. All in all, after spending lots of time fine-tuning the container, I am very satisfied with the results. The only (acceptable) issue is that PDFs saved from browser’s window and generated on the server can (and often do) exhibit some visual differences due to different fonts being used. A known problem of course and, unless one wants to supply own fonts for both cases, something to live with. Summing up - once more thank you @maltoe and kudos!
First of all, thank you and Chromic is really cool!
I’m encountering a rather silly problem when deploying on Fly.io, chrome isn’t present, do I just need to add it to my yaml (docker), has anyone had any experience of this?
I’d like to take this opportunity to wish everyone a happy new year