Skip to content

Instantly share code, notes, and snippets.

@victorono
Last active November 14, 2024 08:39
Show Gist options
  • Save victorono/cd9d14b013487b8b22975512225c5b4c to your computer and use it in GitHub Desktop.
Save victorono/cd9d14b013487b8b22975512225c5b4c to your computer and use it in GitHub Desktop.
Django - remove duplicate objects where there is more than one field to compare
from django.db.models import Count, Max
unique_fields = ['field_1', 'field_2']
duplicates = (
MyModel.objects.values(*unique_fields)
.order_by()
.annotate(max_id=Max('id'), count_id=Count('id'))
.filter(count_id__gt=1)
)
for duplicate in duplicates:
(
MyModel.objects
.filter(**{x: duplicate[x] for x in unique_fields})
.exclude(id=duplicate['max_id'])
.delete()
)
@ahmed-zubair-1998
Copy link

Great starting point. Tried to reduce the number of queries without using raw SQL.

def remove_duplicates_from_table(model, lookup_fields):
    duplicates = (
        model.objects.values(*lookup_fields)
        .order_by()
        .annotate(min_id=Min('id'), count_id=Count('id'))
        .filter(count_id__gt=1)
    )

    fields_lookup = Q()
    duplicate_fields_values = duplicates.values(*lookup_fields)
    for val in duplicate_fields_values:
        fields_lookup |= Q(**val)
    min_ids_list = duplicates.values_list('min_id', flat=True)

    if fields_lookup:
        model.objects.filter(fields_lookup).exclude(id__in=min_ids_list).delete()

Ended up using Q object to avoid making select query in each iteration while looping over the duplicates list.

@DArmstrong87
Copy link

Thank you so much! This is EXACTLY what I needed to fix an issue with one of my migrations taking forever.

@victorono
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment