Skip to content

Instantly share code, notes, and snippets.

@lewoudar
Last active August 12, 2024 19:36
Show Gist options
  • Save lewoudar/fa55ab5d538330600edd41eb55f24788 to your computer and use it in GitHub Desktop.
Save lewoudar/fa55ab5d538330600edd41eb55f24788 to your computer and use it in GitHub Desktop.
Example usage of pdfminer to extract text and images from a PDF
from pdfminer.high_level import extract_text_to_fp
from pdfminer.layout import LAParams
def extract_text_from_pdf(input_filename: str, output_filename: str, output_images_dir: str | None = None) -> None:
with open(input_filename, 'rb') as input_file, open(output_filename, 'w', encoding='utf-8') as output_file:
extract_text_to_fp(input_file, output_file, output_dir=output_images_dir, laparams=LAParams())
extract_text_from_pdf(
"C:\\Users\\rolla\\Downloads\\react systems - LLM.pdf",
'react.txt',
'C:\\Users\\rolla\\Downloads\\images'
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment