Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save documentprocessing/3d215542d5a37387584bade00ce5009a to your computer and use it in GitHub Desktop.
Save documentprocessing/3d215542d5a37387584bade00ce5009a to your computer and use it in GitHub Desktop.
Basic Text Extraction in .NET
using UglyToad.PdfPig;
using System;
class Program
{
static void Main()
{
using var document = PdfDocument.Open("document.pdf");
foreach (var page in document.GetPages())
{
Console.WriteLine($"Page {page.Number}:");
Console.WriteLine(page.Text);
// Get individual words with positions
foreach (var word in page.GetWords())
{
Console.WriteLine($"Word: '{word.Text}' at {word.BoundingBox}");
}
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment