Skip to content

Instantly share code, notes, and snippets.

@dmagda
Last active October 24, 2022 03:20
Show Gist options
  • Save dmagda/e3483fa54ccf1b97ea59ef8f06351b13 to your computer and use it in GitHub Desktop.
Save dmagda/e3483fa54ccf1b97ea59ef8f06351b13 to your computer and use it in GitHub Desktop.
A little demo for the BRIN vs. B-tree index in PostgreSQL

BRIN vs B-tree: a little demo

If you're not familiar with BRIN, check these easy-to-follow articles: https://www.crunchydata.com/blog/postgres-indexing-when-does-brin-win https://medium.com/geekculture/postgres-brin-index-large-data-performance-with-minimal-storage-4db6b...

Start a postgres instance and connect to it with psql or your favorite tool. And then follow the steps below.

  1. Create a table and fill in with dummy date:

    create table sensor (id int, value int, received_time timestamp);
    
    CREATE OR REPLACE FUNCTION random_between(low INT ,high INT) 
    RETURNS INT AS
    $$
    BEGIN
       RETURN floor(random()* (high-low + 1) + low);
    END;
    $$ language 'plpgsql' STRICT;
  2. Generate dummy data:

    insert into sensor 
    select random_between (1,500), /* generating random sensor ID */
           random_between (0,200), /* generating random value reported by a sensor */
           generate_series('2022-10-24'::timestamp,'2023-10-24'::timestamp, '1 minute') /* generate time when the measurement is reported */;
           
    /* refreshing statistics */
    analyze sensor;
  3. Querying data for a specific range. Postgres does a full scan:

    explain analyze select * from sensor where received_time 
    between '2022-12-12'::timestamp and '2023-2-15'::timestamp;
    
     Seq Scan on sensor  (cost=0.00..10726.01 rows=94053 width=16) (actual time=4.273..37.238 rows=93601 loops=1)
     Filter: ((received_time >= '2022-12-12 00:00:00'::timestamp without time zone) AND (received_time <= '2023-02-15 00:00:00'::timestamp without time zone))
     Rows Removed by Filter: 432000
     Planning Time: 9.145 ms
     Execution Time: 40.373 ms
  4. Create a Btree index and re-run the query (two times faster):

    create index time_btree_idx on sensor(received_time);
    
    explain analyze select * from sensor where received_time 
    between '2022-12-12'::timestamp and '2023-2-15'::timestamp;
    
                                   
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------
     Index Scan using time_btree_idx on sensor  (cost=0.42..3333.73 rows=91365 width=16) (actual time=0.053..13.992 rows=93601 loops=1)
       Index Cond: ((received_time >= '2022-12-12 00:00:00'::timestamp without time zone) AND (received_time <= '2023-02-15 00:00:00'::timestamp without time zone))
     Planning Time: 0.185 ms
     Execution Time: 18.116 ms
  5. Get the size of the index:

     select * from pg_size_pretty(pg_table_size('time_btree_idx'));
     pg_size_pretty 
    ----------------
     11 MB
  6. Now, create a BRIN index and execute the same query:

    drop index time_btree_idx;
    
    create index time_brin_idx on sensor using brin(received_time);
    
    explain analyze select * from sensor where received_time 
    between '2022-12-12'::timestamp and '2023-2-15'::timestamp;
                                                                              QUERY PLAN                                                                               
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on sensor  (cost=35.13..4591.04 rows=91871 width=16) (actual time=2.545..12.784 rows=93601 loops=1)
       Recheck Cond: ((received_time >= '2022-12-12 00:00:00'::timestamp without time zone) AND (received_time <= '2023-02-15 00:00:00'::timestamp without time zone))
       Rows Removed by Index Recheck: 24799
       Heap Blocks: lossy=640
       ->  Bitmap Index Scan on time_brin_idx  (cost=0.00..12.16 rows=114261 width=0) (actual time=0.064..0.064 rows=6400 loops=1)
             Index Cond: ((received_time >= '2022-12-12 00:00:00'::timestamp without time zone) AND (received_time <= '2023-02-15 00:00:00'::timestamp without time zone))
     Planning Time: 3.110 ms
     Execution Time: 16.506 ms
  7. And check the BRIN size:

    select * from pg_size_pretty(pg_table_size('time_brin_idx'));
     pg_size_pretty 
    ----------------
     48 kB
    (1 row)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment