Skip to content

Instantly share code, notes, and snippets.

@olivx
Created January 15, 2019 18:47
Show Gist options
  • Save olivx/5bedb32c29e88d5affb97ecec5566976 to your computer and use it in GitHub Desktop.
Save olivx/5bedb32c29e88d5affb97ecec5566976 to your computer and use it in GitHub Desktop.
extract text from pdf
pdfReader = PyPDF2.PdfFileReader(absolute_file_name)
num_pages = pdfReader.numPages
count = 0
text_from_pdf = ""
while count < num_pages:
pageObj = pdfReader.getPage(count)
count +=1
text_from_pdf += pageObj.extractText()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment