Skip to content

Instantly share code, notes, and snippets.

@Hoohm
Last active July 14, 2023 14:45
Show Gist options
  • Save Hoohm/972ad0148876bec90d87 to your computer and use it in GitHub Desktop.
Save Hoohm/972ad0148876bec90d87 to your computer and use it in GitHub Desktop.
Create a unique identifier for a file
def create_file_id(file_path, block_size=256):
'''
Function that takes a file and returns the first 10 characters of a hash of
10 times block size in the middle of the file
Input: File path as string
Output: Hash of 10 blocks of 128 bits of size as string plus file size as string
'''
file_size = os.path.getsize(file_path)
start_index = int(file_size / 2)
with open(file_path, 'r') as f:
f.seek(start_index)
n = 1
md5 = hashlib.md5()
while True:
data = f.read(block_size)
n += 1
if (n == 10):
break
md5.update(data)
return('{}{}'.format(md5.hexdigest()[0:9],str(file_size)))
@mohibulrohman
Copy link

Thank you

@mohibulrohman
Copy link

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment