Last active
August 12, 2024 19:36
-
-
Save lewoudar/fa55ab5d538330600edd41eb55f24788 to your computer and use it in GitHub Desktop.
Example usage of pdfminer to extract text and images from a PDF
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pdfminer.high_level import extract_text_to_fp | |
from pdfminer.layout import LAParams | |
def extract_text_from_pdf(input_filename: str, output_filename: str, output_images_dir: str | None = None) -> None: | |
with open(input_filename, 'rb') as input_file, open(output_filename, 'w', encoding='utf-8') as output_file: | |
extract_text_to_fp(input_file, output_file, output_dir=output_images_dir, laparams=LAParams()) | |
extract_text_from_pdf( | |
"C:\\Users\\rolla\\Downloads\\react systems - LLM.pdf", | |
'react.txt', | |
'C:\\Users\\rolla\\Downloads\\images' | |
) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment