Skip to content

Instantly share code, notes, and snippets.

@johnidm
Created January 29, 2025 13:49
Show Gist options
  • Save johnidm/3b8de80560febd2974eb7bfa75f4358c to your computer and use it in GitHub Desktop.
Save johnidm/3b8de80560febd2974eb7bfa75f4358c to your computer and use it in GitHub Desktop.
PHP - Parsing PDF to Txt

How to parse a PDF file to txt.

Install the dependence: composer require smalot/pdfparser.

Run the following code:

<?php
require __DIR__ . '/vendor/autoload.php';

use Smalot\PdfParser\Parser; # composer require smalot/pdfparser

function parsePdf($filePath)
{
    $parser = new Parser();
    $pdf = $parser->parseFile($filePath);

    if ($pdf === false) {
        throw new Exception("Error parsing PDF");
    }

    $text = $pdf->getText();
    return $text;
}

$filePath = 'document.pdf';
try {
    $text = parsePdf($filePath);
    if (!empty($text)) {
        echo $text;
    } else {
        echo "No text found.";
    }
} catch (Exception $e) {
    echo "Failed to parse PDF: " . $e->getMessage();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment