Skip to content

Instantly share code, notes, and snippets.

@bzamecnik
Last active July 18, 2018 15:22
Show Gist options
  • Save bzamecnik/866a1de363cbda61a236e3ff5fcdca4f to your computer and use it in GitHub Desktop.
Save bzamecnik/866a1de363cbda61a236e3ff5fcdca4f to your computer and use it in GitHub Desktop.
import pandas as pd
import rossum
extracted = rossum.extract('invoice.pdf')
# max score only:
df = pd.DataFrame.from_dict(extracted['fields'])
idx = df.groupby('name')['score'].idxmax()
print(df.iloc[idx][['name', 'value']].to_string(index=False))
name value
amount_due 0.00
amount_paid 99.00
amount_total 99.00
amount_total_base 99.00
date_due 2017-11-11
date_issue 2017-11-11
sender_addrline 19989 Stevens Creek Boulevard
sender_name odrive
terms On-Receipt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment