Skip to content

Instantly share code, notes, and snippets.

@hellpanderrr
Last active March 24, 2021 04:03
Show Gist options
  • Save hellpanderrr/f5bc0f70ccb637d6fe78 to your computer and use it in GitHub Desktop.
Save hellpanderrr/f5bc0f70ccb637d6fe78 to your computer and use it in GitHub Desktop.
Read Excel XML table in a python list
from bs4 import BeautifulSoup
 
def read_excel_xml(path):
    file = open(path).read()
    soup = BeautifulSoup(file,'xml')
    workbook = []
    for sheet in soup.findAll('Worksheet'): 
        sheet_as_list = []
        for row in sheet.findAll('Row'):
            row_as_list = []
            for cell in row.findAll('Cell'):
                row_as_list.append(cell.Data.text)
            sheet_as_list.append(row_as_list)
        workbook.append(sheet_as_list)
    return workbook
@meGregV
Copy link

meGregV commented Mar 25, 2019

очень неплохо, жаль что медленно

@tuf22191
Copy link

tuf22191 commented Feb 21, 2020

I'd be very careful with this implementation - it assumes you have no empty cells in your table . In reality if you have empty cells -> the cell will have a "ss:Index" attribute that you have to check -> this is what allows Excel to store large sparse tables in a compact form. This will throw the code above off. For example, in one row you will have below cells:
<Cell><Data ss:Type="String">TEST</Data></Cell> <Cell ss:Index="20"><Data ss:Type="Number">1</Data></Cell>
This means that the second cell is at column 20 (1's based indexing in stupid Excel!) . The first cell (the string "Test" ) is in the first column. So you will have 18 empty cells in between which are not going to be in the XML. To solve this:
check if ss:Index is present in the cell ---->

if ( type(cell['ss:Index']) == type(None) ) : (do work)

or alternatively

if ( cell['ss:Index'] == None ) : (do work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment