Skip to content

Instantly share code, notes, and snippets.

View gopal1996's full-sized avatar
🏠
Working from home

Gopalakrishnan gopal1996

🏠
Working from home
View GitHub Profile
@nadya-p
nadya-p / pdf_to_text.py
Last active August 15, 2022 04:42
Extract text contents of PDF files recursively
from tika import parser
import os
def extract_text_from_pdfs_recursively(dir):
for root, dirs, files in os.walk(dir):
for file in files:
path_to_pdf = os.path.join(root, file)
[stem, ext] = os.path.splitext(path_to_pdf)
if ext == '.pdf':