documentprocessing’s gists

documentprocessing / render-or-view-pdf-document-in-browser-using-pdfjs-javascript-library.html

Last active September 20, 2023 18:06

Render or View PDF Document in Browser using PDF.js JavaScript Library. Check https://products.documentprocessing.com/viewer/javascript/pdf.js/ for more details.

	// This example contains necessary HTML and JavaScript code to demonstrate the use of PDF.js library
	// by rendering a PDF document in the browser
	<html>
	<head>

	// Link to PDF.js library
	<script src="../build/pdf.js"></script>

	</head>
	<body>

documentprocessing / convert-html-to-pdf-via-web-url-in-python-using-weasyprint-library.html

Last active September 20, 2023 18:05

Convert HTML to PDF via Web URL and also with Inline CSS in Python using WeasyPrint Library. Check https://products.documentprocessing.com/conversion/python/weasyprint/ for more details.

	// Import the HTML class from the WeasyPrint library
	from weasyprint import HTML

	// Instantiate HTML class and call write_pdf() method to convert Website URL to PDF
	HTML('https://www.groupdocs.com/').write_pdf('groupdocs-weasyprint.pdf')

documentprocessing / add-annotations-to-images-in-javascript-using-annotorious-library.html

Last active September 2, 2024 12:24

Add annotations to images manually or automatically using JSON in JavaScript using Annotorious Library. Check https://products.documentprocessing.com/annotation/javascript/annotorious/ for more details.

	<html>
	<head>
	<!-- Linking Annotorious Stylesheet -->
	<link rel="stylesheet" href="dist/annotorious.min.css">

	<!-- Integrating Annotorious JavaScript Library -->
	<script type="text/javascript" src="dist/annotorious.min.js"></script>
	</head>

	<body>

documentprocessing / extract-images-from-pdf-in-python-using-pymupdf-library.py

Last active October 20, 2023 10:37

Explore PDF parsing features of PyMuPDF like extracting text, images & tables from PDF, inserting text into PDF or text recognition using OCR etc. Check https://products.documentprocessing.com/parser/python/pymupdf/ for more details.

	# Import PyMuPDF
	import fitz

	# File path you want to extract images from
	file = "data.pdf"

	# Open the file
	pdf_file = fitz.open(file)

	# Iterate over PDF pages

documentprocessing / combine-or-join-multiple-pdfs-in-python-using-pymupdf-library.py

Last active October 10, 2023 08:21

Learn to combine or join multiple PDFs into one, split a PDF into multiple PDFs, rotate and delete PDF pages in Python using PyMuPDF library. Check https://products.documentprocessing.com/merger/python/pymupdf/ for more details.

	# Import PyMuPDF
	import fitz

	# Open first document
	doc1 = fitz.open("documentprocessing.pdf")

	# Open second document
	doc2 = fitz.open("data.pdf")

	# Append document 2 after document 1

documentprocessing / add-rotate-and-crop-pdf-pages-in-python-using-pypdf-library.py

Last active October 18, 2023 07:37

Add, Rotate, Crop, Merge & Split PDF Files in Python using pypdf Library. Check https://products.documentprocessing.com/merger/python/pypdf/ for more details.

	# Import the PdfWriter & PdfReader classes from the pypdf library
	from pypdf import PdfWriter, PdfReader

	# Open PDF document and instantiate writer object for performing operations on the PDF
	reader = PdfReader("documentprocessing.pdf")
	writer = PdfWriter()

	# Add page 1 from reader to output document, unchanged:
	writer.add_page(reader.pages[0])

documentprocessing / extract-attachments-from-pdf-in-python-using-pypdf-library.py

Last active October 18, 2023 13:08

Extract text, images and attachments from PDF files in Python using pypdf Library. Check https://products.documentprocessing.com/parser/python/pypdf/ for the details.

	# Import the PdfReader class from the pypdf library
	from pypdf import PdfReader

	# Open a PDF file
	reader = PdfReader("data.pdf")

	# Iterate through the attachments in the PDF
	for name, content_list in reader.attachments:

	# Iterate through the contents in each attachment

documentprocessing / extract-font-information-from-pdf-document-in-python-using-pdfminersix-library.py

Last active October 24, 2023 14:26

Extract Text and Font Information from PDF documents in Python using pdfminer.six Library. Check https://products.documentprocessing.com/parser/python/pdfminer.six/ for more details.

	# Import required classes from the pdfminer.six library
	from pdfminer.pdfparser import PDFParser
	from pdfminer.pdfdocument import PDFDocument
	from pdfminer.pdfpage import PDFPage
	from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
	from pdfminer.converter import PDFPageAggregator

	# Open the PDF file
	with open('documentprocessing.pdf', 'rb') as pdf_file:

documentprocessing / convert-pdf-to-html-in-python-using-pdfminersix-library.py

Last active October 27, 2023 02:45

Convert PDF to HTML and PDF to XML in Python using pdfminer.six Library. Check https://products.documentprocessing.com/conversion/python/pdfminer.six/ for more details.

	# Import extract_text_to_fp function from pdfminer.high_level module
	from pdfminer.high_level import extract_text_to_fp

	# Import BytesIO class from io module
	from io import BytesIO

	# Specify the PDF file you want to convert to HTML
	pdf_file = 'documentprocessing.pdf'

	# Create an in-memory buffer to store the HTML output

documentprocessing / add-crossed-out-text-to-pdf-in-javascript-using-pdfkit.js

Last active November 28, 2023 08:24

Add Links, Crossed-Out Text & Interactive Notes Annotations to PDF documents in JavaScript using PDFKit Library. Check https://products.documentprocessing.com/annotation/javascript/pdfkit/ for more details.

	// Include pdfkit library and fs module of Node.js
	const PDFDocument = require('pdfkit');
	const fs = require('fs');

	// Create a new PDF document
	const doc = new PDFDocument();

	// Create a writable stream to save the PDF
	const stream = fs.createWriteStream('annotations.pdf');

Document Processing documentprocessing