Skip to content

Instantly share code, notes, and snippets.

@monadplus
Created September 17, 2020 17:31
Show Gist options
  • Select an option

  • Save monadplus/ccfece014a4c0822ef55ea97786c9ce5 to your computer and use it in GitHub Desktop.

Select an option

Save monadplus/ccfece014a4c0822ef55ea97786c9ce5 to your computer and use it in GitHub Desktop.
Use the Index Luke
_ _ _ _ _ _ _ _
| | | | | | | | (_) | | | | | |
| | | |___ ___ | |_| |__ ___ _ _ __ __| | _____ __ | | _ _| | _____
| | | / __|/ _ \ | __| '_ \ / _ \ | | '_ \ / _` |/ _ \ \/ / | | | | | | |/ / _ \
| |_| \__ \ __/ | |_| | | | __/ | | | | | (_| | __/> < _ | |___| |_| | < __/
\___/|___/\___| \__|_| |_|\___| |_|_| |_|\__,_|\___/_/\_( ) \_____/\__,_|_|\_\___| https://use-the-index-luke.com/sql/
|/
Data is stored in heap with no order. Index are redundant data structures for fast lookup.
Index are composed by:
- Index Leaf Nodes (double-linked list):
- Logical order (not actually physical order)
- Pointer as key to the corresponding database entry
- B-Tree (balanced tree)
- Each index leaf is stored in a block (a unit of storage)
- The biggest index leaf of the block is stored in another block
- We end up with a block that contains pointers to block ... until the leafs
Index lookup steps:
1. - Tree traversal INDEX UNIQUE SCAN
2. - Following the leaf node chain INDEX RANGE SCAN
3. - Fetching the table data TABLE ACCESS BY INDEX ROWID
The where clause:
- Order of index columns matters !
- Do not overuse indexes, they are expensive to maintain
- index on last_name doens't work for UPPER(last_name)
- You need to CREATE INDEX index_up_name ON users (UPPER(last_name))
- current time [SYSDATE] (directly or by function) doesn't work
- index must be deterministics
- Rule of thumb: index for equality first—then for ranges.
- Acces predicate: reduce the range index scan // this is what we want
- Filter predicates: don't
- LIKE: only characters before % 'wildcard' are used
- Avoid LIKE expressions with leading wildcards (e.g., '%TERM').
SELECT first_name, last_name, date_of_birth
FROM employees
WHERE UPPER(last_name) < ?
AND date_of_birth < ?
- Possible solutions:
1) Create an index for (UPPER(last_name), date_of_birth)
2) Create one index for UPPER(last_name) and another for date_of_birth
- One index scan is faster than two 😇
- Partial index:
SELECT message
FROM messages
WHERE processed = 'N'
AND receiver = ?
CREATE INDEX messages_todo
ON messages (receiver)
WHERE processed = 'N'
- The Oracle database does not include rows in an index if all indexed columns are NULL.
That means that every index is a partial index—like having a where clause:
CREATE INDEX idx
ON tbl (A, B, C, ...)
WHERE A IS NOT NULL
OR B IS NOT NULL
OR C IS NOT NULL
...
- If all date_of_birth are NULL this index won't work:
SELECT first_name, last_name
FROM employees
WHERE date_of_birth IS NULL
- SOLUTION 😄
CREATE INDEX emp_dob ON employees (date_of_birth, '1')
- Dates:
SELECT ...
FROM sales
WHERE TRUNC(sale_date) = TRUNC(sysdate - INTERVAL '1' DAY)
Solution:
CREATE INDEX index_name
ON sales (TRUNC(sale_date))
- Numeric strings: DO NOT USE THEM !
numeric is access predicate but string is not
Implicit conversion wont use the index
- Math:
SELECT a, b
FROM table_name
WHERE 3*a + 5 = b
Solution CREATE INDEX math ON table_name (3*a - b)
SELECT a, b
FROM table_name
WHERE 3*a - b = -5
- Index clustering:
Suppose we have an index for subsidiary_id
SELECT first_name, last_name, subsidiary_id, phone_number
FROM employees
WHERE subsidiary_id = ? // access predicate
AND UPPER(last_name) LIKE '%INA%' // Table access
Let's cluster the data to avoid full table access
CREATE INDEX empsubupnam ON employees
(subsidiary_id, UPPER(last_name))
Now the query is faster because the UPPER(last_name) is in order.
- Avoid Table access, Index-Only Scan:
SELECT SUM(eur_value)
FROM sales
WHERE subsidiary_id = ?
- Solution:
CREATE INDEX sales_sub_eur
ON sales
( subsidiary_id, eur_value )
- Be aware ! This query won't use the index at all !
SELECT SUM(eur_value)
FROM sales
WHERE subsidiary_id = ?
AND sale_date > ?
- Order by: an order by clause do not need to sort the result explicitly if the relevant index already delivers the rows in the required order.
CREATE INDEX sales_dt_pr ON sales (sale_date, product_id)
SELECT sale_date, product_id, quantity
FROM sales
WHERE sale_date = TRUNC(sysdate) - INTERVAL '1' DAY
ORDER BY sale_date, product_id
ASC/DESC matters ? No, if all the statements follow the same order
- Databases can read indexes in both directions.
Mixed-order:
- CREATE INDEX sales_dt_pr ON sales (sale_date ASC, product_id DESC)
- Group by: same as Order by wihtout ASC/DESC issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment