Skip to content

Instantly share code, notes, and snippets.

View documentprocessing's full-sized avatar

Document Processing documentprocessing

View GitHub Profile
@documentprocessing
documentprocessing / use-css-selectors-to-find-elements.java
Created May 27, 2025 15:25
Use CSS selectors to find elements with jsoup API
File input = new File("input.html");
Document doc = Jsoup.parse(input, "UTF-8", "https://some-website.com/");
Elements links = doc.select("a[href]"); // a with href
Elements pngs = doc.select("img[src$=.png]");
// img with src ending .png
Element masthead = doc.select("div.masthead").first();
// div with class=masthead
@documentprocessing
documentprocessing / parse-html-using-jsoup.java
Created May 27, 2025 15:17
Parse HTML in Java using jsoup API
String html = "<html><head><title>First parse</title></head>"
+ "<body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);
@documentprocessing
documentprocessing / adding-header-footer-to-pdf-pages.java
Created May 21, 2025 13:03
Adding Header and Footer to PDF in Java
@documentprocessing
documentprocessing / text-extraction-java-pdfbox.java
Created May 21, 2025 12:59
Text Extraction from PDF using PDFBox in Java
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.IOException;
public class PDFTextExtractor {
public static void main(String[] args) {
// Path to your PDF file
String filePath = "sample.pdf";
@documentprocessing
documentprocessing / create-pdf-with-apache-pdfbox.java
Created May 21, 2025 12:56
Create New PDF File from Scratch using Apache PDFBox
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import java.io.IOException;
public class CreatePDFExample {
public static void main(String[] args) {
// 1. Create a new empty document
using UglyToad.PdfPig;
class PdfInspector
{
public void Inspect(string filePath)
{
using var document = PdfDocument.Open(filePath);
// Document metadata
Console.WriteLine($"Title: {document.Information.Title}");
@documentprocessing
documentprocessing / text-positional-analysis-dotnet.cs
Created May 13, 2025 02:59
Text Positional Analysis in .NET
using UglyToad.PdfPig;
using UglyToad.PdfPig.Content;
class TableExtractor
{
public void AnalyzeDocument(string filePath)
{
using var document = PdfDocument.Open(filePath);
foreach (var page in document.GetPages())
using UglyToad.PdfPig;
using System;
class Program
{
static void Main()
{
using var document = PdfDocument.Open("document.pdf");
foreach (var page in document.GetPages())
@documentprocessing
documentprocessing / jstree-checkboxes-dnd.js
Created May 6, 2025 00:37
Checkboxes and Drag and Drop using Javascript
$(document).ready(function() {
$('#advancedTree').jstree({
'plugins': ['checkbox', 'dnd'],
'core': {
'data': [
{
"text": "Documents",
"state": { "opened": true },
"children": [
{ "text": "Project.docx", "type": "file" },
@documentprocessing
documentprocessing / jstree-from-json.js
Created May 6, 2025 00:35
Loading Data with JSON
$(document).ready(function() {
$('#jsonTree').jstree({
'core': {
'data': [
{
"text": "Root Node",
"children": [
{
"text": "Child Node 1",
"icon": "fa fa-file"