Skip to content

Instantly share code, notes, and snippets.

@tchen
Last active August 13, 2024 15:04
Show Gist options
  • Save tchen/01d1d61a985190ff6b71fc14c45f95c9 to your computer and use it in GitHub Desktop.
Save tchen/01d1d61a985190ff6b71fc14c45f95c9 to your computer and use it in GitHub Desktop.
openpyxl: dealing with merged cells
# When exporting excel spreadsheets with merged cells, only the first cell of the merged cell has a value
# This snippet allow you to take the value from the first cell for all the other cells within the merged range
# Tested with openpyxl 3.0.7 as of 2021-06-17
#
# References:
# https://stackoverflow.com/questions/39574991/how-to-detect-merged-cells-in-excel-with-openpyxl
# https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.merge.html
import openpyxl
def parent_of_merged_cell(cell):
""" Find the parent of the merged cell by iterating through the range of merged cells """
sheet = cell.parent
child_coord = cell.coordinate
# Note: if there are many merged cells in a large spreadsheet, this may become inefficient
for merged in sheet.merged_cells.ranges:
if child_coord in merged:
return merged.start_cell.coordinate
return None
def cell_value(cell):
""" Reads the value of a cell, if cell is within a merged cell,
find the first cell in the merged cell and get its value
"""
if isinstance(cell, openpyxl.cell.cell.Cell):
return cell.value
if isinstance(cell, openpyxl.cell.cell.MergedCell):
coord = parent_of_merged_cell(cell)
parent = cell.parent[coord]
return parent.value
workbook = openpyxl.load_workbook(filename)
sheet = workbook['Some Sheet']
# Say A1:A4 are merged, only the first cell have a value
sheet[A1].value # has value
sheet[A2].value # is None
cell_value(sheet[A1]) # returns sheet[A1].value
cell_value(sheet[A2]) # returns sheet[A1].value
cell_value(sheet[A3]) # returns sheet[A1].value
cell_value(sheet[A4]) # returns sheet[A1].value
@WestonAGreene
Copy link

WestonAGreene commented Aug 13, 2024

Here's my take:
To read an Excel file where merged cells are filled in (in other words, the Pandas DataFrame values are all the same), I used the following code:

from openpyxl import load_workbook
import pandas as pd

def _convert_cell_ref_to_df_ref(cell_ref: tuple) -> tuple:
    return tuple([coord - 1 for coord in cell_ref])

file_path = '/file/path.xlsx'  # Will not work for `.xls`
sheet_name = 'sheet name'

excel = pd.ExcelFile(file_path)
df = excel.parse(
    sheet_name=sheet_name,
    header=None,
)
openpyxl_wb = load_workbook(file_path)

for merged_cell in openpyxl_wb[sheet_name].merged_cells:
    try:
        merge_val = df.iloc[_convert_cell_ref_to_df_ref(next(iter(merged_cell.cells)))]
        for cell in merged_cell.cells:
            df.iloc[_convert_cell_ref_to_df_ref(cell)] = merge_val
    except IndexError as e:
        print(f"Most likely the last row in this Excel file is a blank merged cell, which is often times trimmed when read by Pandas.")
        print(e)

The StackOverflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment