Created
December 13, 2017 15:20
-
-
Save samkit-jain/1e13babe2bc95749c242b0b4b756574f to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# reading table using tabula | |
rows = tabula.read_pdf(filepath, | |
pages='all', | |
silent=True, | |
pandas_options={ | |
'header': None, | |
'error_bad_lines': False, | |
'warn_bad_lines': False | |
}) | |
# converting to list | |
rows = rows.values.tolist() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Samkit,
I followed your article to build a similar analyser. But to me it does not seem as simple as it seems in the article. Hence just confirming, with the above given snippet, are you able to get all the relevant data or there are some (for me many) missing data points. Also in most statements tabula is not able to find the table at all, hence I used the area option, which was different for every bank. Please let me understand if this is actually what worked with you for 90% accuracy?