Skip to content

Instantly share code, notes, and snippets.

@fsndzomga
Created September 25, 2023 21:13
Show Gist options
  • Save fsndzomga/a8b1914267f07451bdc1efb38a1e7cb6 to your computer and use it in GitHub Desktop.
Save fsndzomga/a8b1914267f07451bdc1efb38a1e7cb6 to your computer and use it in GitHub Desktop.
Code to load the dataset into a pandas data frame
import openpyxl
import pandas as pd
# Load the Excel file
workbook = openpyxl.load_workbook('online_retail_II.xlsx')
# Create an empty list to store data from all sheets
all_data = []
# Iterate through all sheets in the workbook
for sheet_name in workbook.sheetnames:
sheet = workbook[sheet_name]
# Create a list of dictionaries where each dictionary represents a row of data
sheet_data = []
for row in sheet.iter_rows(min_row=2, values_only=True):
row_dict = {
'Invoice': row[0],
'StockCode': row[1],
'Description': row[2],
'Quantity': row[3],
'InvoiceDate': row[4],
'Price': row[5],
'Customer ID': row[6],
'Country': row[7]
}
sheet_data.append(row_dict)
# Convert the list of dictionaries into a DataFrame for each sheet
sheet_df = pd.DataFrame(sheet_data)
# Append the sheet's DataFrame to the list
all_data.append(sheet_df)
# Concatenate all DataFrames into one
combined_data = pd.concat(all_data, ignore_index=True)
# Close the Excel file
workbook.close()
# Now, you have a single DataFrame "combined_data" containing data from all sheets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment