Created
March 19, 2020 11:49
-
-
Save asadamatic/de187973f78e8ba96006837ab5db5ef7 to your computer and use it in GitHub Desktop.
Python program to extract all the text from a pdf file and saving it as a text file. I have used PyPDF2 library for this module.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from PyPDF2 import PdfFileReader | |
fileName = '' #Enter the name of pdf file here | |
with open(fileName + '.pdf', 'rb') as pdf: | |
pdfReader = PdfFileReader(pdf) | |
totalPages = pdfReader.numPages | |
for page in range(totalPages): | |
text = pdfReader.getPage(page).extractText() | |
with open(fileName + '.txt', 'w') as textFile: | |
textFile.write(text) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment