Skip to content

Instantly share code, notes, and snippets.

@douglasmiranda
Last active October 31, 2024 22:31
Show Gist options
  • Save douglasmiranda/9c19f23c4570a7b7e02137791880ab43 to your computer and use it in GitHub Desktop.
Save douglasmiranda/9c19f23c4570a7b7e02137791880ab43 to your computer and use it in GitHub Desktop.
About PDF to SVG converters

Update in 2024

So in 2024 in actually having a good time using mupdf.

About my experience with mupdf:

It's written in C, but you can use it in many ways. Command line, python lib (pymupdf), js, and others.

I still coundn't find issues that I had with other tools and weird broken PDFs, so I'd say pretty good.

Update: Seems like jpdf2html5 is now as pointed in the comments https://www.idrsolutions.com/buildvu/


(Maybe in the future I will explain better about what happens when you convert your document to pdf, so for now, just keep the originals safe.)

The easy way

Depending on the PDFs you could just extract the text, with simple tools like pdftotext, comes with Poppler Tools.

Want to convert a simple page? You could just simply load some vector editing software, that accepts PDF as an input and export in SVG. Or maybe and online converter. A single document, not too large, you probably could choose this option also.

Not too easy way

If you want to make an software that needs to proccess lots of PDFS, then you want something else.

I've been testing a lot of softwares, I won't remember everything, I should thought of writing something back then when I've spent days looking for the perfect converter.

Inkscape

If your PDF it's simple, not malformed, clean, and it was created by a trusted software.

You could maybe try to simply use something like inkscape, you can convert using command line tool, and to automatize the work is easy. Some people claim it's ok, for me it didn't work. But that's because.. weird-malformed-complex-pdfs...

Other tools

There's other tools that works better with PDFs you can't choose how they where created. PDF with lots of tables, graphic components, embed fonts, fonts missing and so on.

It worth mention that if you have malformed files, fonts missing, and other defects you could try to correct those errors with. PDFTOCAIRO tool. CPDF from Coherent PDF tools, is great too.

  • jpdf2html5 Despite the name, it does conversion too SVG too.
    • I've tested, it does a very good job, I kind don't even need to optimize the SVG files after the conversion.
    • But ... Java, it's command line, but it's a .jar, so if you were looking for a beautiful compiled binary, that's not the case.
    • Other thing, it's expensive, I know I said it was great, but dude, usually conversion to SVG is one of many steps you will make to create a final product. Do you know what I mean? You need other softwares, and will have other costs. Anyway if you think their price is good for you, that's ok.

THE WINNER

  • pdf2svg I know, not a big project, not a lot of contributors.
    • It just converts! xD
    • I work with shitty PDFs and only a handful did not accepted to be successfully converted, basically PDFs with copy protection that "scramble words".
    • Files are big, but optimize SVGs it's not a very difficult job, check this out.
    • If you want to build, and know your way to Docker, take a look on my Dockerfiles.
@douglasmiranda
Copy link
Author

Thanks douglas for the epic documentation right there It's truly wonderful how google can find these gists and point users to it very useful

Thanks :) I would add, now in 2024, mupdf as a viable tool.

@XinyuIDR
Copy link

Many thanks for your review on our product!
The jpdf2html5 link has been outdated now, I assume you were pointing to the BuildVu.

@douglasmiranda
Copy link
Author

Many thanks for your review on our product! The jpdf2html5 link has been outdated now, I assume you were pointing to the BuildVu.

Thanks =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment