Skip to content

Instantly share code, notes, and snippets.

@jkburges
Created June 17, 2015 02:04
Show Gist options
  • Save jkburges/85014d561393e1427e0a to your computer and use it in GitHub Desktop.
Save jkburges/85014d561393e1427e0a to your computer and use it in GitHub Desktop.
Check duplicate counting
select * from (
select count(*), d.timestamp, d.receiver_name, d.transmitter_id from detection d
join (
select timestamp, receiver_name, transmitter_id from detection
where receiver_download_id = 127971115
and duplicate = true
) dups
on d.timestamp = dups.timestamp
and d.receiver_name = dups.receiver_name
and d.transmitter_id = dups.transmitter_id
group by d.timestamp, d.receiver_name, d.transmitter_id
order by d.timestamp, d.receiver_name, d.transmitter_id
) dup_counts
where count = 1
@jkburges
Copy link
Author

@dnahodil I believe this query confirms that duplicate counting is correct.
The algorithm is basically, for every duplicate in the particular upload, count how many detections in total match on each timestamp/receiver/transmitter combination.

There are no results for this query, as expected. Any results would indicate that for each duplicate in the upload, there is only one matching detection - i.e. not really a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment