-
-
Save mdellavo/853413 to your computer and use it in GitHub Desktop.
try: | |
import openpyxl | |
def XLSXDictReader(f): | |
book = openpyxl.reader.excel.load_workbook(f) | |
sheet = book.get_active_sheet() | |
rows = sheet.get_highest_row() | |
cols = sheet.get_highest_column() | |
headers = dict( (i, sheet.cell(row=0, column=i).value) \ | |
for i in range(cols) ) | |
def item(i, j): | |
return (sheet.cell(row=0, column=j).value, | |
sheet.cell(row=i, column=j).value) | |
return (dict(item(i,j) for j in range(cols)) for i in range(1, rows)) | |
except ImportError: | |
XLSXDictReader = None |
This is what works for me on openpyxl 2.4.8 using sheet.max_row
and sheet.max_column
def XLSXDictReader(f):
book = openpyxl.reader.excel.load_workbook(f)
sheet = book.get_active_sheet()
rows = sheet.max_row
cols = sheet.max_column
headers = dict((i, sheet.cell(row=1, column=i).value) for i in range(1, cols))
def item(i, j):
return (sheet.cell(row=1, column=j).value, sheet.cell(row=i, column=j).value)
return (dict(item(i, j) for j in range(1, cols + 1)) for i in range(2, rows + 1))
Thanks for doing the hard work!
@erinkeith @adejones thank you both for continuing to improve on this! (and @erinkeith I'm sorry I didn't see the notification when you commented).
I'm glad to take a few lines of code out with the switch to .max_row
and .max_column
. The current openpyxl documentation says that get_active_sheet()
is deprecated and should be replaced with just .active
, which is working for me. I appreciate the simplicity & conciseness of @adejones's version, but for the files I work with I often run into problems because of empty cells being returned as None
rather than the empty strings that Python's built-in CSV support treats them as. And I sometimes need to be able to specify which worksheet to use, so here's my latest version in case it helps anyone:
def XLSXDictReader(fileName, sheetName=None):
book = openpyxl.reader.excel.load_workbook(fileName)
# if there's no sheet name specified, try to get the active sheet. This will work reliably for workbooks with only one sheet; unpredictably if there are multiple worksheets present.
if sheetName is None:
sheet = book.active
elif sheetName not in book.sheetnames:
print(sheetName, "not found in", fileName)
exit()
else:
sheet = book[sheetName]
rows = sheet.max_row + 1
cols = sheet.max_column + 1
def cleanValue(s):
if s == None:
return ''
else:
return str(s).strip()
def item(i, j):
return (
cleanValue(sheet.cell(row=1, column=j).value),
cleanValue(sheet.cell(row=i, column=j).value)
)
return (dict(item(i,j) for j in range(1, cols)) for i in range(2, rows))
Tested with openpyxl==2.6.1
only.
@eldang, openpyxl does have attributes for the work sheet to get you the max_row and max_columns:
rows = sheet.max_row
cols = sheet.max_column