Skip to content

Instantly share code, notes, and snippets.

@kellabyte
Last active August 29, 2015 13:55
Show Gist options
  • Save kellabyte/8763462 to your computer and use it in GitHub Desktop.
Save kellabyte/8763462 to your computer and use it in GitHub Desktop.

Indexes are 8 (to make it small enough to read) sorted 32bit integer segments. Each integer represents the record_id that matches the term. Each segment is stored in a Key/Value store.

Index A represents the rows that have the term "Canada" in a Country column. Index B represents the rows that have the term "Ontario" in a Province column.

Segments from both indexes will be read off disk using a Key/Value store and intersected to evaluate a conjunction query.

Index A | Index B
-----------------
     Segment 1
-----------------
      2 | 18  <--- Should skip intersection until Segment 2 of Index A is decoded?
      4 | 20
      6 | 22
      8 | 24
     10 | 26
     12 | 28
     14 | 30
     16 | 48  <--- Notice this record_id.
-----------------
     Segment 2
-----------------
     18 | 
     20 | 
     22 | 
     24 | 
     26 | 
     28 | 
     30 | 
     32 | 
-----------------
     Segment 3
-----------------
     34 | 
     36 | 
     38 | 
     40 | 
     42 | 
     44 | 
     46 | 
     48 | 

Questions:

  1. Index B has integers that match Segment 2 and 3 from Index A. Do I have to intersect Index B twice?
  2. If the indexes are GB's in size, how do I know which segments need to be rewritten if a row is modified?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment