Created
March 14, 2020 19:49
-
-
Save hui-zheng/b52eaebaff6737385e3791fb89ea897f to your computer and use it in GitHub Desktop.
[BigQuery Advanced SQL] the most flexible script to detect and display duplicate records and remove duplicates (dedup)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- base_time has t | |
WITH rows_by_key AS( | |
SELECT | |
surrogate_key, | |
array_agg(base_table) as _rows, | |
count(*) as _count | |
FROM `gcp_project.data_set.original_table` as base_table | |
WHERE stamp BETWEEN "2020-03-12T00:00:00" AND "2020-03-14T00:00:00" | |
GROUP BY surrogate_key | |
) | |
SELECT | |
* | |
FROM rows_by_key | |
where _count > 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment