Skip to content

Instantly share code, notes, and snippets.

@magnetikonline
Last active January 22, 2025 19:54
Show Gist options
  • Save magnetikonline/b226a6d2b5c2bc99fbbf20f0f607bbeb to your computer and use it in GitHub Desktop.
Save magnetikonline/b226a6d2b5c2bc99fbbf20f0f607bbeb to your computer and use it in GitHub Desktop.
Python - hashing JSON data structures.

Python - hashing JSON data structures

A function hash_json(data), accepting a structure loaded from json.load() and computing a hash.

Example

$ ./hashjson.py
Hash a.json: 8212462b8e9ce805cac2f0758127c5cfd7710baf
Hash b.json: 8212462b8e9ce805cac2f0758127c5cfd7710baf

JSON files a.json and b.json are loaded via load_json() and structures passed to hash_json().

{
"apples": "oranges",
"grapes": [
1,
2,
3,
4
],
"pears": {
"one": 1,
"two": 2,
"three": 3
}
}
{
"pears": {
"three": 3,
"two": 2,
"one": 1
},
"apples": "oranges",
"grapes": [
1,
2,
3,
4
]
}
#!/usr/bin/env python3
import hashlib
import json
from typing import Any
def load_json(file_path):
# open JSON file and parse contents
fh = open(file_path, "r")
data = json.load(fh)
fh.close()
return data
def hash_json(data: Any) -> str:
def hasher(value: Any):
if type(value) is list:
# hash each item within the list
for item in value:
hasher(item)
return
if type(value) is dict:
# work over each property in the dictionary, using a sorted order
for item_key in sorted(value.keys()):
# hash both the property key and the value
hash.update(item_key.encode())
hasher(value[item_key])
return
if type(value) is not str:
value = str(value)
hash.update(value.encode())
# create new hash, walk given data and return result
hash = hashlib.sha1()
hasher(data)
return hash.hexdigest()
def main():
# import testing JSON files to Python structures
a_json = load_json("a.json")
b_json = load_json("b.json")
# hash each JSON structure
print(f"Hash a.json: {hash_json(a_json)}")
print(f"Hash b.json: {hash_json(b_json)}")
if __name__ == "__main__":
main()
@Danny-06
Copy link

I was looking for these exactly.
But I noticed a problem.

If you have for example a dict with 1 entry with key and value being the same string
and a list with 2 items that has that same string, both will return the same hash.

json_object = {
  'example': 'example',
}

json_list = [
  'example',
  'example',
]

hash_object = hash_json(json_object)
hash_list = hash_json(json_list)

print(f'Are hashes equal: {hash_object == hash_list}') # True

This could be solved by appending an arbitrary character before hashing a list to create a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment