Skip to content

Instantly share code, notes, and snippets.

@jodok
Last active July 27, 2017 15:04
Show Gist options
  • Save jodok/551772cab8ca67f84c396517ef2796fc to your computer and use it in GitHub Desktop.
Save jodok/551772cab8ca67f84c396517ef2796fc to your computer and use it in GitHub Desktop.
create table j.test (
id integer,
status string,
uri string,
created_at timestamp)
insert into j.test (id, status, uri, create_at) values
(1, 'START', 'file://myfile.txt', 1),
(1, 'PROGRESS', NULL, 2),
(1, 'PROGRESS', NULL, 2),
(1, 'OK', NULL, 4),
(2, 'START', 'file://myfile.txt', 2),
(2, 'PROGRESS', NULL, 3),
(2, 'PROGRESS', NULL, 4),
(2, 'FAIL', NULL, 8),
(3, 'START', NULL, 3),
(3, 'PROGRESS', NULL, 4),
(3, 'PROGRESS', NULL, 4),
(3, 'OK', NULL, 5)
SELECT a.id, MAX(a.uri) as u, count(*) FROM (select id, uri, count(*) from j.test GROUP BY id, uri) AS a GROUP BY a.id;
SELECT id, count(*) FROM j.test WHERE id IN (select id from j.test WHERE uri != '') GROUP BY id;
@matriv
Copy link

matriv commented Jul 27, 2017

"Manual semi-join":
SELECT t1.id, count(*) FROM j.test t1 INNER JOIN (select distinct id from j.test where uri IS NOT NULL AND uri != '') t2 on t1.id = t2.id GROUP BY t1.id
or maybe: SELECT id, count(*) FROM j.test where id = any(select collect_set(id) from j.test where uri != '') GROUP BY id
performs better (until real in (subquery) functionality is implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment