Skip to content

Instantly share code, notes, and snippets.

@suvodeep-pyne
Created January 4, 2022 05:40
Show Gist options
  • Save suvodeep-pyne/322334d17e0854546cb24e8ad85c26ca to your computer and use it in GitHub Desktop.
Save suvodeep-pyne/322334d17e0854546cb24e8ad85c26ca to your computer and use it in GitHub Desktop.
Convert a spreadsheet from pdf to csv
import pandas as pd
import tabula
def rh_pdf_to_csv(pdf_filepath, csv_filepath):
df_list = tabula.read_pdf(pdf_filepath, pages='all', pandas_options={'header': None})
df = pd.concat(df_list, axis=0, ignore_index=True)
# Take the first row as header
new_header = df.iloc[0] # grab the first row for the header
df = df[1:] # take the data less the header row
df.columns = new_header # set the header row as the df header
df.to_csv(csv_filepath, index=False)
# Usage
filepath = '/path/to/Robinhoodrh-gains-losses.pdf'
csv_filepath = '/path/to/Robinhoodrh-gains-losses.csv'
rh_pdf_to_csv(filepath, csv_filepath)
print('done!')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment