FYI: the open source state of the art in this area is Playwright (the successor ...

caesil · on March 11, 2024

I think https://github.com/diegomura/react-pdf is closer to what this company is doing.

In fact their open source library, https://github.com/OnedocLabs/react-print-pdf, seems like a higher-level library that sits above react-pdf. Reminds me a lot of the set of react-pdf based components I built for a corporate job where letting users create PDFs was a huge part of the value proposition.

They're solving a really cool problem, actually, because building out into certain difficult use cases like SVG support was a huge pain.

AugusteLef · on March 12, 2024

Exactly. We are aiming at offering a solution to build complex PDF design. Which means having 100% control over the layout (margin, header, footer), the style and also the content. That's why we integrated Tailwind, CharkraUI, Markdown, LaTeX, and also wanted to support SVG etc.

Titou325 · on March 11, 2024

We are currently experimenting with this approach. A good thing about paged.js is that we would be able to provide hot-reload and live preview of files without actually converting to PDF.

Your second point is very interesting, seems like some kind of .assert('text').isVisible() API. We may want to dig into that further!

rudasn · on March 11, 2024

Or maybe some visual diffing based on expected output, based on the template/layout/theme used, since you'd want to perform this check on every pdf generated in prod (that has real, sensitive data) , not just in CI or testing mode, if you're aiming for critical docs.

Cool project btw, congrats for the launch!

cyanydeez · on March 12, 2024

safer would be out of band pdf-image conversion the OCR

WHen dealing with layouts and assurance, I would go way out to verify as close to print as possible.

there's also a discussion about color to black/grayscale printing where you want a document to stay in character at grayscale.

these would be premium features I'd think.

timvdalen · on March 11, 2024

(How) does it handle CMYK and print PDFs? I see images of printed books created by Paged.js, were these post-processed, or printed using a printer that does a best-effort RGB conversion?

ak217 · on March 11, 2024

I'm not sure - we don't do color correction on our PDFs because we don't have photos in them and color rendering is not mission critical - but paged.js is focused on the concern of layout for print media. I would imagine color rendering can be solved orthogonally to what paged.js does for you, as long as you specify the color data in CSS. I'm pretty sure paged.js will pass it through without messing with it, so you're good if the browser that Playwright/puppeteer is driving supports the correct color profile when emitting the PDF. I honestly don't know if browsers have sufficient support for that when emitting a PDF, though.

Overall you're right that color correction is another area where you could probably command a premium.

timvdalen · on March 11, 2024

It's certainly an area with more depth than I anticipated when I first started getting into it. Adobe is still pretty much the only one that can get a PDF compliant with print standards.

As far as I know, there's no way to currently get colors adhering to print color profiles in CMYK out of browsers.

Indeed, if color correctness isn't mission critical, I can imagine that going with Paged.js can be a nice experience!

(Edit: in my experience so far, it's been really really hard to 'correct' colors from an existing PDF in a way that gets a satisfying end result---the colors are usually muted/washed out)

ak217 · on March 11, 2024

I was curious and searched around and found this presentation: https://www.w3.org/Graphics/Color/Workshop/slides/Erias.pdf

You're right - although many of the building blocks are there, it appears there is no way to specify a colorspace or print profile when asking Chrome to emit a PDF (and I doubt the other browsers are any better). Skia (the PDF rendering engine that Chromium uses) actually supports colorspace transforms, but Chromium doesn't seem to hook that up to CSS or even support non-RGBA colors in its rendering pipeline.

Mick-Jogger · on March 12, 2024

Isn't Playwright a testing framework, I am not sure how this solves the use-case that Onedoc is aiming for. I would be highly interested in some more background as we are evaluating alternative solutions to princeXML right now.

ak217 · on March 12, 2024

Playwright at its core is a headless browser driver. In this case, we are using it to tell the browser to generate a PDF.