monadplus · September 17, 2020 17:31
diff --git a/use_the_index_luke.txt b/use_the_index_luke.txt
 _   _            _   _            _           _               _           _
 | | | |          | | | |          (_)         | |             | |         | |
 | | | |___  ___  | |_| |__   ___   _ _ __   __| | _____  __   | |    _   _| | _____
 | | | / __|/ _ \ | __| '_ \ / _ \ | | '_ \ / _` |/ _ \ \/ /   | |   | | | | |/ / _ \
 | |_| \__ \  __/ | |_| | | |  __/ | | | | | (_| |  __/>  < _  | |___| |_| |   <  __/
 \___/|___/\___|  \__|_| |_|\___| |_|_| |_|\__,_|\___/_/\_( ) \_____/\__,_|_|\_\___| https://use-the-index-luke.com/sql/
                                                          |/

 Data is stored in heap with no order. Index are redundant data structures for fast lookup.

 Index are composed by:
    - Index Leaf Nodes (double-linked list):
        - Logical order (not actually physical order)
        - Pointer as key to the corresponding database entry

    - B-Tree (balanced tree)
        - Each index leaf is stored in a block (a unit of storage)
        - The biggest index leaf of the block is stored in another block
        - We end up with a block that contains pointers to block ... until the leafs

 Index lookup steps:
    1. - Tree traversal                           INDEX UNIQUE SCAN
    2. - Following the leaf node chain            INDEX RANGE SCAN
    3. - Fetching the table data                  TABLE ACCESS BY INDEX ROWID


 The where clause:
    - Order of index columns matters !
    - Do not overuse indexes, they are expensive to maintain
    - index on last_name doens't work for UPPER(last_name)
            - You need to CREATE INDEX index_up_name ON users (UPPER(last_name))
    - current time [SYSDATE] (directly or by function) doesn't work
        - index must be deterministics
    - Rule of thumb: index for equality first—then for ranges.
    - Acces predicate: reduce the range index scan // this is what we want
    - Filter predicates: don't
    - LIKE: only characters before % 'wildcard' are used
        - Avoid LIKE expressions with leading wildcards (e.g., '%TERM').

    SELECT first_name, last_name, date_of_birth
      FROM employees
     WHERE UPPER(last_name) < ?
       AND date_of_birth    < ?

        - Possible solutions:
            1) Create an index for (UPPER(last_name), date_of_birth)
            2) Create one index for UPPER(last_name) and another for date_of_birth

        - One index scan is faster than two 😇

    - Partial index:
        SELECT message
          FROM messages
         WHERE processed = 'N'
           AND receiver  = ?

       CREATE INDEX messages_todo
                 ON messages (receiver)
              WHERE processed = 'N'

      - The Oracle database does not include rows in an index if all indexed columns are NULL.
            That means that every index is a partial index—like having a where clause:

            CREATE INDEX idx
                      ON tbl (A, B, C, ...)
                   WHERE A IS NOT NULL
                      OR B IS NOT NULL
                      OR C IS NOT NULL
                         ...
         - If all date_of_birth are NULL this index won't work:

          SELECT first_name, last_name
            FROM employees
           WHERE date_of_birth IS NULL

            - SOLUTION 😄
                CREATE INDEX emp_dob ON employees (date_of_birth, '1')

       - Dates:
            SELECT ...
              FROM sales
             WHERE TRUNC(sale_date) = TRUNC(sysdate - INTERVAL '1' DAY)

           Solution:
             CREATE INDEX index_name
                       ON sales (TRUNC(sale_date))

       - Numeric strings: DO NOT USE THEM !
             numeric is access predicate but string is not
             Implicit conversion wont use the index

       - Math:
            SELECT a, b
              FROM table_name
             WHERE 3*a + 5 = b

           Solution  CREATE INDEX math ON table_name (3*a - b)
             SELECT a, b
               FROM table_name
              WHERE 3*a - b = -5

      - Index clustering:
            Suppose we have an index for subsidiary_id

            SELECT first_name, last_name, subsidiary_id, phone_number
              FROM employees
             WHERE subsidiary_id = ?               // access predicate
               AND UPPER(last_name) LIKE '%INA%'   // Table access

            Let's cluster the data to avoid full table access

            CREATE INDEX empsubupnam ON employees
                   (subsidiary_id, UPPER(last_name))

            Now the query is faster because the UPPER(last_name) is in order.

       - Avoid Table access, Index-Only Scan:

            SELECT SUM(eur_value)
              FROM sales
             WHERE subsidiary_id = ?

            - Solution:

             CREATE INDEX sales_sub_eur
                 ON sales
                  ( subsidiary_id, eur_value )

            - Be aware ! This query won't use the index at all !

                SELECT SUM(eur_value)
                  FROM sales
                 WHERE subsidiary_id = ?
                   AND sale_date > ?

       - Order by: an order by clause do not need to sort the result explicitly if the relevant index already delivers the rows in the required order.

        CREATE INDEX sales_dt_pr ON sales (sale_date, product_id)

        SELECT sale_date, product_id, quantity
          FROM sales
         WHERE sale_date = TRUNC(sysdate) - INTERVAL '1' DAY
         ORDER BY sale_date, product_id

         ASC/DESC matters ? No, if all the statements follow the same order

                - Databases can read indexes in both directions.

         Mixed-order:
                 - CREATE INDEX sales_dt_pr ON sales (sale_date ASC, product_id DESC)

        - Group by: same as Order by wihtout ASC/DESC issues
	_ _ _ _ _ _ _ _
	\| \| \| \| \| \| \| \| (_) \| \| \| \| \| \|
	\| \| \| \|___ ___ \| \|_\| \|__ ___ _ _ __ __\| \| _____ __ \| \| _ _\| \| _____
	\| \| \| / __\|/ _ \ \| __\| '_ \ / _ \ \| \| '_ \ / _` \|/ _ \ \/ / \| \| \| \| \| \| \|/ / _ \
	\| \|_\| \__ \ __/ \| \|_\| \| \| \| __/ \| \| \| \| \| (_\| \| __/> < _ \| \|___\| \|_\| \| < __/
	\___/\|___/\___\| \__\|_\| \|_\|\___\| \|_\|_\| \|_\|\__,_\|\___/_/\_( ) \_____/\__,_\|_\|\_\___\| https://use-the-index-luke.com/sql/
	\|/

	Data is stored in heap with no order. Index are redundant data structures for fast lookup.

	Index are composed by:
	- Index Leaf Nodes (double-linked list):
	- Logical order (not actually physical order)
	- Pointer as key to the corresponding database entry

	- B-Tree (balanced tree)
	- Each index leaf is stored in a block (a unit of storage)
	- The biggest index leaf of the block is stored in another block
	- We end up with a block that contains pointers to block ... until the leafs

	Index lookup steps:
	1. - Tree traversal INDEX UNIQUE SCAN
	2. - Following the leaf node chain INDEX RANGE SCAN
	3. - Fetching the table data TABLE ACCESS BY INDEX ROWID


	The where clause:
	- Order of index columns matters !
	- Do not overuse indexes, they are expensive to maintain
	- index on last_name doens't work for UPPER(last_name)
	- You need to CREATE INDEX index_up_name ON users (UPPER(last_name))
	- current time [SYSDATE] (directly or by function) doesn't work
	- index must be deterministics
	- Rule of thumb: index for equality first—then for ranges.
	- Acces predicate: reduce the range index scan // this is what we want
	- Filter predicates: don't
	- LIKE: only characters before % 'wildcard' are used
	- Avoid LIKE expressions with leading wildcards (e.g., '%TERM').

	SELECT first_name, last_name, date_of_birth
	FROM employees
	WHERE UPPER(last_name) < ?
	AND date_of_birth < ?

	- Possible solutions:
	1) Create an index for (UPPER(last_name), date_of_birth)
	2) Create one index for UPPER(last_name) and another for date_of_birth

	- One index scan is faster than two 😇

	- Partial index:
	SELECT message
	FROM messages
	WHERE processed = 'N'
	AND receiver = ?

	CREATE INDEX messages_todo
	ON messages (receiver)
	WHERE processed = 'N'

	- The Oracle database does not include rows in an index if all indexed columns are NULL.
	That means that every index is a partial index—like having a where clause:

	CREATE INDEX idx
	ON tbl (A, B, C, ...)
	WHERE A IS NOT NULL
	OR B IS NOT NULL
	OR C IS NOT NULL
	...
	- If all date_of_birth are NULL this index won't work:

	SELECT first_name, last_name
	FROM employees
	WHERE date_of_birth IS NULL

	- SOLUTION 😄
	CREATE INDEX emp_dob ON employees (date_of_birth, '1')

	- Dates:
	SELECT ...
	FROM sales
	WHERE TRUNC(sale_date) = TRUNC(sysdate - INTERVAL '1' DAY)

	Solution:
	CREATE INDEX index_name
	ON sales (TRUNC(sale_date))

	- Numeric strings: DO NOT USE THEM !
	numeric is access predicate but string is not
	Implicit conversion wont use the index

	- Math:
	SELECT a, b
	FROM table_name
	WHERE 3*a + 5 = b

	Solution CREATE INDEX math ON table_name (3*a - b)
	SELECT a, b
	FROM table_name
	WHERE 3*a - b = -5

	- Index clustering:
	Suppose we have an index for subsidiary_id

	SELECT first_name, last_name, subsidiary_id, phone_number
	FROM employees
	WHERE subsidiary_id = ? // access predicate
	AND UPPER(last_name) LIKE '%INA%' // Table access

	Let's cluster the data to avoid full table access

	CREATE INDEX empsubupnam ON employees
	(subsidiary_id, UPPER(last_name))

	Now the query is faster because the UPPER(last_name) is in order.

	- Avoid Table access, Index-Only Scan:

	SELECT SUM(eur_value)
	FROM sales
	WHERE subsidiary_id = ?

	- Solution:

	CREATE INDEX sales_sub_eur
	ON sales
	( subsidiary_id, eur_value )

	- Be aware ! This query won't use the index at all !

	SELECT SUM(eur_value)
	FROM sales
	WHERE subsidiary_id = ?
	AND sale_date > ?

	- Order by: an order by clause do not need to sort the result explicitly if the relevant index already delivers the rows in the required order.

	CREATE INDEX sales_dt_pr ON sales (sale_date, product_id)

	SELECT sale_date, product_id, quantity
	FROM sales
	WHERE sale_date = TRUNC(sysdate) - INTERVAL '1' DAY
	ORDER BY sale_date, product_id

	ASC/DESC matters ? No, if all the statements follow the same order

	- Databases can read indexes in both directions.

	Mixed-order:
	- CREATE INDEX sales_dt_pr ON sales (sale_date ASC, product_id DESC)

	- Group by: same as Order by wihtout ASC/DESC issues
No results found