Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FYI: the open source state of the art in this area is Playwright (the successor to Puppeteer) with Paged.js (https://pagedjs.org/). I highly recommend that everyone check out and donate to paged.js, it's a fantastic project with lots to like. It certainly blows commercial alternatives like Prince XML out of the water.

That forms a solid foundation that I find it hard to imagine paying for. The things where you might still command a premium are basically safety mechanisms/CI checks/library components that ensure the PDF renders correctly in the presence of variable-length content, etc. as well as maybe PDF-specific features like metadata and fillable forms. Naive ways to format headers, footers, tables/grids/flexboxes etc. often fail in PDFs because of unexpected layout complications. So having a methodology, process, and validation system for ensuring that a mission critical piece of information appears on a PDF in the presence of these constraints could be attractive.



I think https://github.com/diegomura/react-pdf is closer to what this company is doing.

In fact their open source library, https://github.com/OnedocLabs/react-print-pdf, seems like a higher-level library that sits above react-pdf. Reminds me a lot of the set of react-pdf based components I built for a corporate job where letting users create PDFs was a huge part of the value proposition.

They're solving a really cool problem, actually, because building out into certain difficult use cases like SVG support was a huge pain.


Exactly. We are aiming at offering a solution to build complex PDF design. Which means having 100% control over the layout (margin, header, footer), the style and also the content. That's why we integrated Tailwind, CharkraUI, Markdown, LaTeX, and also wanted to support SVG etc.


We are currently experimenting with this approach. A good thing about paged.js is that we would be able to provide hot-reload and live preview of files without actually converting to PDF.

Your second point is very interesting, seems like some kind of .assert('text').isVisible() API. We may want to dig into that further!


Or maybe some visual diffing based on expected output, based on the template/layout/theme used, since you'd want to perform this check on every pdf generated in prod (that has real, sensitive data) , not just in CI or testing mode, if you're aiming for critical docs.

Cool project btw, congrats for the launch!


safer would be out of band pdf-image conversion the OCR

WHen dealing with layouts and assurance, I would go way out to verify as close to print as possible.

there's also a discussion about color to black/grayscale printing where you want a document to stay in character at grayscale.

these would be premium features I'd think.


(How) does it handle CMYK and print PDFs? I see images of printed books created by Paged.js, were these post-processed, or printed using a printer that does a best-effort RGB conversion?


I'm not sure - we don't do color correction on our PDFs because we don't have photos in them and color rendering is not mission critical - but paged.js is focused on the concern of layout for print media. I would imagine color rendering can be solved orthogonally to what paged.js does for you, as long as you specify the color data in CSS. I'm pretty sure paged.js will pass it through without messing with it, so you're good if the browser that Playwright/puppeteer is driving supports the correct color profile when emitting the PDF. I honestly don't know if browsers have sufficient support for that when emitting a PDF, though.

Overall you're right that color correction is another area where you could probably command a premium.


It's certainly an area with more depth than I anticipated when I first started getting into it. Adobe is still pretty much the only one that can get a PDF compliant with print standards.

As far as I know, there's no way to currently get colors adhering to print color profiles in CMYK out of browsers.

Indeed, if color correctness isn't mission critical, I can imagine that going with Paged.js can be a nice experience!

(Edit: in my experience so far, it's been really really hard to 'correct' colors from an existing PDF in a way that gets a satisfying end result---the colors are usually muted/washed out)


I was curious and searched around and found this presentation: https://www.w3.org/Graphics/Color/Workshop/slides/Erias.pdf

You're right - although many of the building blocks are there, it appears there is no way to specify a colorspace or print profile when asking Chrome to emit a PDF (and I doubt the other browsers are any better). Skia (the PDF rendering engine that Chromium uses) actually supports colorspace transforms, but Chromium doesn't seem to hook that up to CSS or even support non-RGBA colors in its rendering pipeline.


Isn't Playwright a testing framework, I am not sure how this solves the use-case that Onedoc is aiming for. I would be highly interested in some more background as we are evaluating alternative solutions to princeXML right now.


Playwright at its core is a headless browser driver. In this case, we are using it to tell the browser to generate a PDF.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: