find . -type f -print0 |xargs -0 filefrag |awk -F: '{ gsub("extents", "extent", $2); gsub("extent found", "", $2); print( $2, $1)}' |sort -n
start by calculating ssdeep on files to find similar hashing files
use this to find "close" matches.
apply python:
all close matches get compared against each-other, pairwise
import os
with open(f1, "r+b") as fb1, open(f2, "r+b") as fb2:
m1 = mmap.mmap(fb1.fileno, 0)