Skip to content

Instantly share code, notes, and snippets.

@jorpic
Last active February 21, 2016 20:53
Show Gist options
  • Save jorpic/af2f9b92202aa5250f79 to your computer and use it in GitHub Desktop.
Save jorpic/af2f9b92202aa5250f79 to your computer and use it in GitHub Desktop.
Longest streak events

Here is a query adopted from blog.jooq.org:

explain analyze with
  dates(d) as
    (select distinct date(ctime)
      from "Event"
      where modelname='Partner'
        and type = 'Update'
        and userid = 193
    ),
  groups(d, grp) as
    (select distinct
        date(d),
        date(d) - (dense_rank() over (order by d) :: int)
      from dates
    )
  select
      min(d) as first,
      max(d) as last,
      count(*) as diff
    from groups
    group by grp
    order by diff desc, first;

It is quite fast:

 Sort  (cost=35833.59..35834.09 rows=200 width=8) (actual time=21.237..21.242 rows=87 loops=1)
   Sort Key: (count(*)), (min(groups.d))
   Sort Method: quicksort  Memory: 31kB
   CTE dates
     ->  HashAggregate  (cost=35534.10..35564.32 rows=2417 width=8) (actual time=20.688..20.729 rows=207 loops=1)
           ->  Index Scan using "Event_userid_idx" on "Event"  (cost=0.44..35528.06 rows=2417 width=8) (actual time=0.601..19.254 rows=6469 loops=1)
                 Index Cond: (userid = 193)
                 Filter: ((modelname = 'Partner'::text) AND (type = 'Update'::"EventType"))
                 Rows Removed by Filter: 22012
   CTE groups
     ->  HashAggregate  (cost=250.63..253.63 rows=200 width=4) (actual time=21.022..21.044 rows=207 loops=1)
           ->  WindowAgg  (cost=184.16..238.55 rows=2417 width=4) (actual time=20.828..20.971 rows=207 loops=1)
                 ->  Sort  (cost=184.16..190.21 rows=2417 width=4) (actual time=20.819..20.839 rows=207 loops=1)
                       Sort Key: dates.d
                       Sort Method: quicksort  Memory: 34kB
                       ->  CTE Scan on dates  (cost=0.00..48.34 rows=2417 width=4) (actual time=20.690..20.778 rows=207 loops=1)
   ->  HashAggregate  (cost=6.00..8.00 rows=200 width=8) (actual time=21.185..21.198 rows=87 loops=1)
         ->  CTE Scan on groups  (cost=0.00..4.00 rows=200 width=8) (actual time=21.026..21.111 rows=207 loops=1)
 Total runtime: 21.411 ms

More straightforward approach is to filter consecutive dates by checking if lead(ctime) is the next date.

explain analyze with
  dates(d) as
    (select distinct date(ctime)
      from "Event"
      where modelname='Partner'
        and type = 'Update'
        and userid = 193
    ),
  seq(d, in_seq) as
    (select d, lead(d) over (order by d) = (d + 1) from dates)
  select
      first.d as first,
      min(last.d) as last,
      min(last.d) - first.d as diff
    from seq first join seq last on (first.d < last.d)
    where first.in_seq and not last.in_seq
    group by first.d
    order by diff desc;

It is a bit shorter and runs as fast as the first one.

 Sort  (cost=72373.63..72373.88 rows=100 width=8) (actual time=22.176..22.182 rows=120 loops=1)
   Sort Key: ((min(last.d) - first.d))
   Sort Method: quicksort  Memory: 30kB
   CTE dates
     ->  HashAggregate  (cost=35534.10..35564.32 rows=2417 width=8) (actual time=16.494..16.525 rows=207 loops=1)
           ->  Index Scan using "Event_userid_idx" on "Event"  (cost=0.44..35528.06 rows=2417 width=8) (actual time=0.297..15.307 rows=6469 loops=1)
                 Index Cond: (userid = 193)
                 Filter: ((modelname = 'Partner'::text) AND (type = 'Update'::"EventType"))
                 Rows Removed by Filter: 22012
   CTE seq
     ->  WindowAgg  (cost=184.16..238.55 rows=2417 width=4) (actual time=16.631..16.745 rows=207 loops=1)
           ->  Sort  (cost=184.16..190.21 rows=2417 width=4) (actual time=16.624..16.635 rows=207 loops=1)
                 Sort Key: dates.d
                 Sort Method: quicksort  Memory: 34kB
                 ->  CTE Scan on dates  (cost=0.00..48.34 rows=2417 width=4) (actual time=16.496..16.575 rows=207 loops=1)
   ->  HashAggregate  (cost=36566.20..36567.45 rows=100 width=8) (actual time=22.133..22.152 rows=120 loops=1)
         ->  Nested Loop  (cost=0.00..32918.04 rows=486421 width=8) (actual time=16.640..21.063 rows=5628 loops=1)
               Join Filter: (first.d < last.d)
               Rows Removed by Join Filter: 4692
               ->  CTE Scan on seq first  (cost=0.00..48.34 rows=1208 width=4) (actual time=16.633..16.654 rows=120 loops=1)
                     Filter: in_seq
                     Rows Removed by Filter: 87
               ->  CTE Scan on seq last  (cost=0.00..48.34 rows=1208 width=4) (actual time=0.000..0.025 rows=86 loops=120)
                     Filter: (NOT in_seq)
                     Rows Removed by Filter: 121
 Total runtime: 22.278 ms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment