Skip to content

Instantly share code, notes, and snippets.

@angstwad
Last active December 22, 2024 16:02
Show Gist options
  • Select an option

  • Save angstwad/bf22d1822c38a92ec0a9 to your computer and use it in GitHub Desktop.

Select an option

Save angstwad/bf22d1822c38a92ec0a9 to your computer and use it in GitHub Desktop.
Recursive dictionary merge in Python
import collections
def dict_merge(dct, merge_dct):
""" Recursive dict merge. Inspired by :meth:``dict.update()``, instead of
updating only top-level keys, dict_merge recurses down into dicts nested
to an arbitrary depth, updating keys. The ``merge_dct`` is merged into
``dct``.
:param dct: dict onto which the merge is executed
:param merge_dct: dct merged into dct
:return: None
"""
for k, v in merge_dct.iteritems():
if (k in dct and isinstance(dct[k], dict) and isinstance(merge_dct[k], dict)): #noqa
dict_merge(dct[k], merge_dct[k])
else:
dct[k] = merge_dct[k]
# Copyright 2024 Paul Durivage
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
@eligundry

Copy link
Copy Markdown

You da real MVP

@cetanu

cetanu commented Nov 27, 2017

Copy link
Copy Markdown

I love you, I appreciate you

@BrianAndersen78

Copy link
Copy Markdown

I love you to!

@softwarevamp

Copy link
Copy Markdown

Is it possible to use this when calls yaml.load?

@hawksight

Copy link
Copy Markdown

Just what I needed, thank you.
Note for python3 users, turn .iteritems() to .items().

@softwarevamp - not sure in what area you want to load yaml, but here's an example, where the 'merge_dict' would be loaded from yaml prior to function call.

master_dict = {}
with open(</path/file.yaml>, 'r') as f: dict_from_yaml = yaml.safe_load(f)
dict_merge(master_dict, dict_from_yaml)
print(yaml.safe_dump(master_dict))

Although that assumes only one yaml document is in the file etc..
Hope that helps.

@wskinner

wskinner commented Apr 4, 2018

Copy link
Copy Markdown

@angstwad would you be willing to add a license to this nice code snippet? I would like to use it at my company, but without a license the lawyers will be very unhappy with me :)

@newmen

newmen commented May 29, 2018

Copy link
Copy Markdown
from toolz.dicttoolz import merge_with


def deep_merge(*ds):
    def combine(vals):
        if len(vals) == 1 or not all(isinstance(v, dict) for v in vals):
            return vals[-1]
        else:
            return deep_merge(*vals)
    return merge_with(combine, *ds)

@DomWeldon

Copy link
Copy Markdown

Here's a Python 3 version with a test case that: a) returns a new dictionary rather than updating the old ones, and b) controls whether to add in keys from merge_dct which are not in dct.

from unittest import TestCase
import collections


def dict_merge(dct, merge_dct, add_keys=True):
    """ Recursive dict merge. Inspired by :meth:``dict.update()``, instead of
    updating only top-level keys, dict_merge recurses down into dicts nested
    to an arbitrary depth, updating keys. The ``merge_dct`` is merged into
    ``dct``.

    This version will return a copy of the dictionary and leave the original
    arguments untouched.

    The optional argument ``add_keys``, determines whether keys which are
    present in ``merge_dict`` but not ``dct`` should be included in the
    new dict.

    Args:
        dct (dict) onto which the merge is executed
        merge_dct (dict): dct merged into dct
        add_keys (bool): whether to add new keys

    Returns:
        dict: updated dict
    """
    dct = dct.copy()
    if not add_keys:
        merge_dct = {
            k: merge_dct[k]
            for k in set(dct).intersection(set(merge_dct))
        }

    for k, v in merge_dct.items():
        if (k in dct and isinstance(dct[k], dict)
                and isinstance(merge_dct[k], collections.Mapping)):
            dct[k] = dict_merge(dct[k], merge_dct[k], add_keys=add_keys)
        else:
            dct[k] = merge_dct[k]

    return dct


class DictMergeTestCase(TestCase):
    def test_merges_dicts(self):
        a = {
            'a': 1,
            'b': {
                'b1': 2,
                'b2': 3,
            },
        }
        b = {
            'a': 1,
            'b': {
                'b1': 4,
            },
        }

        assert dict_merge(a, b)['a'] == 1
        assert dict_merge(a, b)['b']['b2'] == 3
        assert dict_merge(a, b)['b']['b1'] == 4

    def test_inserts_new_keys(self):
        """Will it insert new keys by default?"""
        a = {
            'a': 1,
            'b': {
                'b1': 2,
                'b2': 3,
            },
        }
        b = {
            'a': 1,
            'b': {
                'b1': 4,
                'b3': 5
            },
            'c': 6,
        }

        assert dict_merge(a, b)['a'] == 1
        assert dict_merge(a, b)['b']['b2'] == 3
        assert dict_merge(a, b)['b']['b1'] == 4
        assert dict_merge(a, b)['b']['b3'] == 5
        assert dict_merge(a, b)['c'] == 6

    def test_does_not_insert_new_keys(self):
        """Will it avoid inserting new keys when required?"""
        a = {
            'a': 1,
            'b': {
                'b1': 2,
                'b2': 3,
            },
        }
        b = {
            'a': 1,
            'b': {
                'b1': 4,
                'b3': 5,
            },
            'c': 6,
        }

        assert dict_merge(a, b, add_keys=False)['a'] == 1
        assert dict_merge(a, b, add_keys=False)['b']['b2'] == 3
        assert dict_merge(a, b, add_keys=False)['b']['b1'] == 4
        try:
            assert dict_merge(a, b, add_keys=False)['b']['b3'] == 5
        except KeyError:
            pass
        else:
            raise Exception('New keys added when they should not be')

        try:
            assert dict_merge(a, b, add_keys=False)['b']['b3'] == 6
        except KeyError:
            pass
        else:
            raise Exception('New keys added when they should not be')

@sylann

sylann commented Jul 19, 2018

Copy link
Copy Markdown

@angstwad Your method will fail to update lists.
By the way, why do you test if dct[k] is dict instead of collections.Mapping?

@DomWeldon you are replacing a reference of dct with a shallow copy of itself? This is essentially doing nothing.
Declare another variable and define it with a deepcopy instead:

from copy import deepcopy
dct2 = deepcopy(dct)

@jpopelka

Copy link
Copy Markdown
  • You iterate over key, value in merge_dct but then throw the value away and get the value by index
    solution: use v instead of merge_dct[k]
  • k in dct and isinstance(dct[k], dict) can be simplified to isinstance(dct.get(k), dict)
    for k, v in merge_dct.items():
        if isinstance(dct.get(k), dict) and isinstance(v, collections.Mapping):
            dct[k] = dict_merge(dct[k], v, add_keys=add_keys)
        else:
            dct[k] = v

all @DomWeldon's tests pass with this

@mcw0

mcw0 commented Jan 18, 2019

Copy link
Copy Markdown

brilliant stuff, needed and will use that in next PoC

@Danielyan86

Copy link
Copy Markdown

why does this code use collections.Mapping instead of dict type?

@drts01

drts01 commented Nov 7, 2019

Copy link
Copy Markdown

@Danielyan86, good question. It should be dict. dict is a type of Mapping, but so is Counter(https://docs.python.org/3/glossary.html#term-mapping). I think allowing a Counter to be valid would not result in the expected behavior.

@drts01

drts01 commented Nov 7, 2019

Copy link
Copy Markdown

Here is my take on the function:

def dict_merge(base_dct, merge_dct):
    base_dct.update({
        key: dict_merge(rtn_dct[key], merge_dct[key])
        if isinstance(base_dct.get(key), dict) and isinstance(merge_dct[key], dict)
        else merge_dct[key]
        for key in merge_dct.keys()
    })

And my take on @DomWeldon's implantation:

def dict_fmerge(base_dct, merge_dct, add_keys=True):
    rtn_dct = base_dct.copy()
    if add_keys is False:
        merge_dct = {key: merge_dct[key] for key in set(rtn_dct).intersection(set(merge_dct))}

    rtn_dct.update({
        key: dict_fmerge(rtn_dct[key], merge_dct[key], add_keys=add_keys)
        if isinstance(rtn_dct.get(key), dict) and isinstance(merge_dct[key], dict)
        else merge_dct[key]
        for key in merge_dct.keys()
    })
    return rtn_dct

https://gist.github.com/CMeza99/5eae3af0776bef32f945f34428669437

@deathbywedgie

deathbywedgie commented May 15, 2020

Copy link
Copy Markdown

I like some of @CMeza99's changes to @DomWeldon's implementation, but neither version respects/merges lists, so I hybridized the two and added a few things. Now lists will be merged, checks and raises an exception if both dicts have a key in common but different data types, and the method takes 2 or more dicts to merge instead of just two.

import collections.abc


def dict_merge(*args, add_keys=True):
    assert len(args) >= 2, "dict_merge requires at least two dicts to merge"
    rtn_dct = args[0].copy()
    merge_dicts = args[1:]
    for merge_dct in merge_dicts:
        if add_keys is False:
            merge_dct = {key: merge_dct[key] for key in set(rtn_dct).intersection(set(merge_dct))}
        for k, v in merge_dct.items():
            if not rtn_dct.get(k):
                rtn_dct[k] = v
            elif k in rtn_dct and type(v) != type(rtn_dct[k]):
                raise TypeError(f"Overlapping keys exist with different types: original is {type(rtn_dct[k])}, new value is {type(v)}")
            elif isinstance(rtn_dct[k], dict) and isinstance(merge_dct[k], collections.abc.Mapping):
                rtn_dct[k] = dict_merge(rtn_dct[k], merge_dct[k], add_keys=add_keys)
            elif isinstance(v, list):
                for list_value in v:
                    if list_value not in rtn_dct[k]:
                        rtn_dct[k].append(list_value)
            else:
                rtn_dct[k] = v
    return rtn_dct

@vivainio

vivainio commented Nov 3, 2020

Copy link
Copy Markdown

Word of warning: the snippet is licensed under GPL

@DomWeldon

Copy link
Copy Markdown

If you find this online and are looking to merge two dictionaries, please see the new dictionary merge features of Python 3.9 as discussed extensively in PEP 584.

@Stargateur

Copy link
Copy Markdown

If you find this online and are looking to merge two dictionaries, please see the new dictionary merge features of Python 3.9 as discussed extensively in PEP 584.

I fail to see how this help

@tfeldmann

tfeldmann commented Jan 23, 2022

Copy link
Copy Markdown

In some of the solutions here the returned dict contains references to the input dicts. This could cause some serious bugs.

Testcase:

a = {}
b = {"a": {"b": 1, "c": 2}}
a = deep_merge(a, b)
assert a == b
b["a"]["b"] = 5
assert a != b

My version which passes this test (MIT license):

from copy import deepcopy

def deep_merge(a: dict, b: dict) -> dict:
    result = deepcopy(a)
    for bk, bv in b.items():
        av = result.get(bk)
        if isinstance(av, dict) and isinstance(bv, dict):
            result[bk] = deep_merge(av, bv)
        else:
            result[bk] = deepcopy(bv)
    return result

@ptrxyz

ptrxyz commented Mar 18, 2022

Copy link
Copy Markdown

@tfeldmann I am pretty sure that this:

        av = result.get("k")

should be this:

        av = result.get(bk)

@tfeldmann

Copy link
Copy Markdown

Yes of course. Corrected.

@tfeldmann

Copy link
Copy Markdown

Everything works fine if merge_dict is empty. Your version doesn't return a copy but the base dict itself, so that could lead to bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment