Created
June 17, 2015 02:04
-
-
Save jkburges/85014d561393e1427e0a to your computer and use it in GitHub Desktop.
Check duplicate counting
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select * from ( | |
select count(*), d.timestamp, d.receiver_name, d.transmitter_id from detection d | |
join ( | |
select timestamp, receiver_name, transmitter_id from detection | |
where receiver_download_id = 127971115 | |
and duplicate = true | |
) dups | |
on d.timestamp = dups.timestamp | |
and d.receiver_name = dups.receiver_name | |
and d.transmitter_id = dups.transmitter_id | |
group by d.timestamp, d.receiver_name, d.transmitter_id | |
order by d.timestamp, d.receiver_name, d.transmitter_id | |
) dup_counts | |
where count = 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@dnahodil I believe this query confirms that duplicate counting is correct.
The algorithm is basically, for every duplicate in the particular upload, count how many detections in total match on each timestamp/receiver/transmitter combination.
There are no results for this query, as expected. Any results would indicate that for each duplicate in the upload, there is only one matching detection - i.e. not really a duplicate.