Here is a query adopted from blog.jooq.org:
explain analyze with
dates(d) as
(select distinct date(ctime)
from "Event"
where modelname='Partner'
and type = 'Update'
and userid = 193
),
groups(d, grp) as
(select distinct
date(d),
date(d) - (dense_rank() over (order by d) :: int)
from dates
)
select
min(d) as first,
max(d) as last,
count(*) as diff
from groups
group by grp
order by diff desc, first;
It is quite fast:
Sort (cost=35833.59..35834.09 rows=200 width=8) (actual time=21.237..21.242 rows=87 loops=1)
Sort Key: (count(*)), (min(groups.d))
Sort Method: quicksort Memory: 31kB
CTE dates
-> HashAggregate (cost=35534.10..35564.32 rows=2417 width=8) (actual time=20.688..20.729 rows=207 loops=1)
-> Index Scan using "Event_userid_idx" on "Event" (cost=0.44..35528.06 rows=2417 width=8) (actual time=0.601..19.254 rows=6469 loops=1)
Index Cond: (userid = 193)
Filter: ((modelname = 'Partner'::text) AND (type = 'Update'::"EventType"))
Rows Removed by Filter: 22012
CTE groups
-> HashAggregate (cost=250.63..253.63 rows=200 width=4) (actual time=21.022..21.044 rows=207 loops=1)
-> WindowAgg (cost=184.16..238.55 rows=2417 width=4) (actual time=20.828..20.971 rows=207 loops=1)
-> Sort (cost=184.16..190.21 rows=2417 width=4) (actual time=20.819..20.839 rows=207 loops=1)
Sort Key: dates.d
Sort Method: quicksort Memory: 34kB
-> CTE Scan on dates (cost=0.00..48.34 rows=2417 width=4) (actual time=20.690..20.778 rows=207 loops=1)
-> HashAggregate (cost=6.00..8.00 rows=200 width=8) (actual time=21.185..21.198 rows=87 loops=1)
-> CTE Scan on groups (cost=0.00..4.00 rows=200 width=8) (actual time=21.026..21.111 rows=207 loops=1)
Total runtime: 21.411 ms
More straightforward approach is to filter consecutive dates by checking if lead(ctime)
is the next date.
explain analyze with
dates(d) as
(select distinct date(ctime)
from "Event"
where modelname='Partner'
and type = 'Update'
and userid = 193
),
seq(d, in_seq) as
(select d, lead(d) over (order by d) = (d + 1) from dates)
select
first.d as first,
min(last.d) as last,
min(last.d) - first.d as diff
from seq first join seq last on (first.d < last.d)
where first.in_seq and not last.in_seq
group by first.d
order by diff desc;
It is a bit shorter and runs as fast as the first one.
Sort (cost=72373.63..72373.88 rows=100 width=8) (actual time=22.176..22.182 rows=120 loops=1)
Sort Key: ((min(last.d) - first.d))
Sort Method: quicksort Memory: 30kB
CTE dates
-> HashAggregate (cost=35534.10..35564.32 rows=2417 width=8) (actual time=16.494..16.525 rows=207 loops=1)
-> Index Scan using "Event_userid_idx" on "Event" (cost=0.44..35528.06 rows=2417 width=8) (actual time=0.297..15.307 rows=6469 loops=1)
Index Cond: (userid = 193)
Filter: ((modelname = 'Partner'::text) AND (type = 'Update'::"EventType"))
Rows Removed by Filter: 22012
CTE seq
-> WindowAgg (cost=184.16..238.55 rows=2417 width=4) (actual time=16.631..16.745 rows=207 loops=1)
-> Sort (cost=184.16..190.21 rows=2417 width=4) (actual time=16.624..16.635 rows=207 loops=1)
Sort Key: dates.d
Sort Method: quicksort Memory: 34kB
-> CTE Scan on dates (cost=0.00..48.34 rows=2417 width=4) (actual time=16.496..16.575 rows=207 loops=1)
-> HashAggregate (cost=36566.20..36567.45 rows=100 width=8) (actual time=22.133..22.152 rows=120 loops=1)
-> Nested Loop (cost=0.00..32918.04 rows=486421 width=8) (actual time=16.640..21.063 rows=5628 loops=1)
Join Filter: (first.d < last.d)
Rows Removed by Join Filter: 4692
-> CTE Scan on seq first (cost=0.00..48.34 rows=1208 width=4) (actual time=16.633..16.654 rows=120 loops=1)
Filter: in_seq
Rows Removed by Filter: 87
-> CTE Scan on seq last (cost=0.00..48.34 rows=1208 width=4) (actual time=0.000..0.025 rows=86 loops=120)
Filter: (NOT in_seq)
Rows Removed by Filter: 121
Total runtime: 22.278 ms