Gopalakrishnan gopal1996

🏠

Working from home

nadya-p / pdf_to_text.py

Last active August 15, 2022 04:42

Extract text contents of PDF files recursively

	from tika import parser
	import os


	def extract_text_from_pdfs_recursively(dir):
	for root, dirs, files in os.walk(dir):
	for file in files:
	path_to_pdf = os.path.join(root, file)
	[stem, ext] = os.path.splitext(path_to_pdf)
	if ext == '.pdf':