Created
August 13, 2024 13:15
-
-
Save vhsu/68b29765175298ff71a81c5fa9f7bf61 to your computer and use it in GitHub Desktop.
Script to Remove Duplicate Images from a Folder This script identifies and removes duplicate images within a specified folder. It uses perceptual hashing to generate a unique fingerprint for each image, allowing it to detect duplicates even if the files have different names or formats.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Script to Remove Duplicate Images from a Folder | |
This script identifies and removes duplicate images within a specified folder. | |
It uses perceptual hashing to generate a unique fingerprint for each image, | |
allowing it to detect duplicates even if the files have different names or formats. | |
What the Code Does: | |
------------------- | |
1. Scans the specified folder for image files (supports common formats such as | |
.jpg, .jpeg, .png, .bmp, .gif, .tiff, .webp). | |
2. Computes a perceptual hash for each image to identify duplicates. | |
3. If two images share the same hash, one of the duplicates is removed. | |
4. Prints the names of the duplicate files that were found and removed. | |
How to Use: | |
----------- | |
1. Save this script as a Python file, e.g., `remove_duplicates.py`. | |
2. Run the script using Python. | |
3. You will be prompted to enter the path of the folder containing the images | |
you want to check for duplicates. | |
4. The script will process the images in the folder, removing any duplicates | |
it finds and printing the results to the console. | |
Example: | |
-------- | |
$ python remove_duplicates.py | |
Enter the path to the folder: /path/to/your/image/folder | |
Dependencies: | |
------------- | |
- Python 3.x | |
- PIL (Pillow) | |
- imagehash | |
Make sure to install the required packages using pip if they are not already installed: | |
$ pip install pillow imagehash | |
""" | |
import os | |
from PIL import Image | |
import imagehash | |
def image_hash(filepath): | |
"""Returns the perceptual hash of the image file. | |
Args: | |
filepath (str): The path to the image file. | |
Returns: | |
str: The perceptual hash of the image. | |
""" | |
with Image.open(filepath) as img: | |
return str(imagehash.phash(img)) | |
def main(folder_path): | |
"""Removes duplicate images in the specified folder. | |
This function computes the perceptual hash of each image in the given folder | |
and deletes any images that have the same hash (i.e., duplicate images). | |
Args: | |
folder_path (str): The path to the folder containing images. | |
""" | |
if not os.path.exists(folder_path): | |
print("Folder not found!") | |
return | |
duplicates = [] | |
hash_keys = dict() | |
# Define a set of common image file extensions | |
image_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.gif', '.tiff', '.webp'} | |
# Iterate through each image in the folder | |
for filename in os.listdir(folder_path): | |
file_path = os.path.join(folder_path, filename) | |
# Check if the file is an image by its extension | |
if os.path.isfile(file_path) and os.path.splitext(filename)[1].lower() in image_extensions: | |
try: | |
img_hash = image_hash(file_path) | |
if img_hash in hash_keys: | |
print(f"Duplicate found: {filename}") | |
duplicates.append(file_path) | |
else: | |
hash_keys[img_hash] = filename | |
except IOError: | |
# Handle the exception if the file is not a valid image | |
print(f"Cannot open {filename} as an image.") | |
# Remove duplicates | |
for duplicate_file in duplicates: | |
os.remove(duplicate_file) | |
print(f"Removed: {duplicate_file}") | |
print("Duplicate removal complete.") | |
if __name__ == '__main__': | |
folder_path = input("Enter the path to the folder: ") # Prompt the user for the folder path | |
main(folder_path) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment