Three python diff libraries were evaluated for comparing resource revisions.
- jsonpatch
- dictdiffer
- DeepDiff
Additional a consideration of rolling our own thats specific to custodian's needs.
On a whole it does a good job of producing a minimal diff that matches the semantic changes. There are some bugs on the repo that need investigation.
- Url https://github.com/stefankoegl/python-json-patch
- License BSD
[{u'op': u'replace', u'path': u'/IpPermissions/0/ToPort', u'value': 80},
{u'op': u'add',
u'path': u'/Tags/1',
u'value': {'Key': 'AppId', 'Value': 'SomethingGood'}}]
- Url https://github.com/inveniosoftware/dictdiffer
- License MIT
[('change', ['IpPermissions', 0, 'ToPort'], (53, 80)),
('change', ['Tags', 1, 'Value'], ('Name', 'SomethingGood')),
('change', ['Tags', 1, 'Key'], ('Origin', 'AppId')),
('add', 'Tags', [(2, {'Key': 'Origin', 'Value': 'Name'})])]
The change here is correct, but requires a bit of semantic interpretation, it ends up mutating elements in position as it considers position within a list a strict diff, where as in all circumstances we want the semantic delta on a list rather than a mutation in place.
- Url https://github.com/seperman/deepdiff
- License MIT
{'iterable_item_added': {"root['IpPermissions'][0]": {'FromPort': 53,
'IpProtocol': 'tcp',
'IpRanges': ['10.0.0.0/8'],
'PrefixListIds': [],
'ToPort': 80,
'UserIdGroupPairs': []},
"root['Tags'][1]": {'Key': 'AppId',
'Value': 'SomethingGood'}},
'iterable_item_removed': {"root['IpPermissions'][0]": {'FromPort': 53,
'IpProtocol': 'tcp',
'IpRanges': ['10.0.0.0/8'],
'PrefixListIds': [],
'ToPort': 53,
'UserIdGroupPairs': []}}}
Deep diff is fairly configurable, the only non default param here is ignore_order.
The returned semantic structure of the diff is quite obtuse, and idiosyncratic.
The issue with most of the diff libraries, is that they require significant interpretation to line up with the api call semantics around any given resource. Ie. a security group rule is effectively immutable, and modification which might be represented by a diff library as a 'change', requires removal of original and addition of modified.
With DeepDiff, even though the entire obj gets returned, you could still use that when making a call back to AWS. When I do create a rule in AWS, I need to know all of the things listed in that object. Knowing just the line that changed doesn't help.