Description: A Jupyter Notebook for the unsupervised clustering of faces in a directory, specifically optimized for biologically related individuals where the true number of unique identities is unknown.
How it works:
- Embeddings: Extracts facial embeddings using InsightFace (utilizing the
antelopev2model for maximum discriminative power between family members). - Distance Calculation: Computes a pairwise cosine distance matrix.
- Dynamic Thresholding: Uses Agglomerative Clustering to group faces based on a strict distance threshold, determined by finding the valley in a bimodal distribution of the pairwise distances.
- Outputs: Generates a Pandas DataFrame mapping file paths to their respective cluster IDs.
- Visualization: Computes the geometric centroid of each group to visualize the top 3 representative images for valid clusters (ignoring small groups with fewer than 4 members).