Skip to content

Instantly share code, notes, and snippets.

@XinyuIDR
Last active November 27, 2024 11:47
Show Gist options
  • Save XinyuIDR/2a5663538d5f892c9cdf1a2e87bfa554 to your computer and use it in GitHub Desktop.
Save XinyuIDR/2a5663538d5f892c9cdf1a2e87bfa554 to your computer and use it in GitHub Desktop.
Convert PDF to HTML in command line | PDF to HTML cmd | PDF to HTML

Convert PDF to HTML from the command line 

You can run BuildVu to convert PDF to HTML directly from the command line which is useful for running the converter from another language or script.

Prerequisites: 

BuildVu-HTML: 

java -Xmx512M -jar buildvu-html.jar /inputDirectory/ /outputDirectory/

You may want to provide more memory by increasing the Xmx value.

IDRViewer or Content mode?

The default mode generates the document inside the IDRViewer. To generate just the raw content to be used inside your own custom solution, you can set -Dorg.jpedal.pdf2html.viewMode=content

java -Dorg.jpedal.pdf2html.viewMode=content -jar buildvu-svg.jar /inputDirectory/ /outputDirectory/

How are the settings controlled? 

When running from the command line, settings are controlled by passing in system properties. Available settings and their values can be found in the Conversion Options section.

java -Dorg.jpedal.pdf2html.compressImages=true -jar buildvu.jar /inputDirectory/ /outputDirectory/

Office Document Support: 

Although BuildVu’s primary function is to convert PDF files to HTML5, it is also possible to enable conversion of Office documents to HTML5 by utilising LibreOffice to pre-convert office documents to PDF.

After you have installed LibreOffice, simply pass in the absolute path to the LibreOffice executable as a system property to enable conversion of Office documents to HTML5 from the command line.

java -Dorg.jpedal.pdf2html.libreOfficeExecutablePath="/path/to/soffice" -jar buildvu-svg.jar /inputDir/ /outputDir/

Conversion will fail if a PDF file with the same filename as the office document already exists. This can be avoided by allowing the file to be overwritten, this is set with -Dorg.jpedal.pdf2html.allowLibreOfficeOverwrite=true

If you are running LibreOffice on Linux you may find that some files do not convert correctly if they make use of fonts that are not available on Linux. We would recommend installing Google Noto Fonts to increase the likelihood that missing fonts will be substituted with a fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment