Tip
Refer to this Skill for code base auditing and reporting.
Tip
Refer to this Skill for code base auditing and reporting.
A Common Table Expression (CTE) in MySQL is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement.
Think of it as a temporary view that exists only for the duration of that single query. It allows you to write cleaner, more readable SQL by breaking down complex queries into smaller, modular building blocks.
CTEs were introduced in MySQL 8.0, so you will need that version or higher to use them.
A CTE always starts with the WITH keyword, followed by the CTE name and the query that defines it.
WITH cte_name AS (
-- Your subquery goes here
SELECT column1, column2
FROM table_name
WHERE condition
)
-- Your main query that uses the CTE
SELECT * FROM cte_name;Before CTEs, if you wanted to use a temporary result set, you had to use nested subqueries or derived tables. This often led to "Pyramid of Doom" code that was incredibly hard to read.
Here is a quick comparison of how a CTE improves readability:
SELECT department_id, AVG(salary)
FROM (
SELECT department_id, salary
FROM employees
WHERE status = 'Active'
) AS active_employees
GROUP BY department_id;WITH active_employees AS (
SELECT department_id, salary
FROM employees
WHERE status = 'Active'
)
SELECT department_id, AVG(salary)
FROM active_employees
GROUP BY department_id;Why this is better: You read a CTE from top to bottom, exactly how your brain processes logic. With subqueries, you have to read from the inside out.
MySQL supports two types of Common Table Expressions:
These are simple CTEs that do not reference themselves. They are purely used to modularize code, join multiple temporary sets together, or simplify complex aggregations.
You can even chain multiple CTEs together by separating them with a comma:
WITH total_sales AS (
SELECT region, SUM(amount) AS total FROM sales GROUP BY region
),
top_regions AS (
SELECT region FROM total_sales WHERE total > 100000
)
SELECT * FROM top_regions;This is where CTEs get supercharged. A recursive CTE is a query that references its own name. It continuously loops through data until a specific condition is met.
This is incredibly useful for dealing with hierarchical or tree-structured data (like organizational charts, category/sub-category loops, or bill-of-materials).
A recursive CTE requires the RECURSIVE keyword and consists of two parts combined by a UNION:
Example: Generating numbers from 1 to 5
WITH RECURSIVE number_sequence AS (
-- Anchor member
SELECT 1 AS n
UNION ALL
-- Recursive member (loops until n = 5)
SELECT n + 1 FROM number_sequence WHERE n < 5
)
SELECT * FROM number_sequence;Database optimization can feel like a dark art, but the EXPLAIN command is your "Matrix vision." It reveals how MySQL plans to execute your query before you run it.
You can't fix what you can't find. Use the Slow Query Log to catch queries that exceed a specific time threshold.
-- Check current settings
SHOW VARIABLES LIKE 'slow_query_log';
SHOW VARIABLES LIKE 'long_query_time';
-- Set threshold to 0.5 seconds (Global)
SET GLOBAL long_query_time = 0.5;
SET GLOBAL slow_query_log = 'ON';Focus on these five critical columns. They tell the story of your query's efficiency.
| Column | Importance | High-Level Meaning |
|---|---|---|
type |
βββ | How MySQL finds the rows (The "Speed Limit"). |
possible_keys |
ββ | The Candidates: Every index MySQL could have used. |
key |
βββ | The Winner: The actual index MySQL decided to use. |
rows |
ββ | Estimate of how many rows MySQL must scan. |
Extra |
βββ | Critical "under-the-hood" details (Sorts, Temps, Loops). |
Think of this as the optimizer's narrowing-down process:
possible_keys: These are the "Shortlisted" indexes. If this is NULL, you have no indexes that match your WHERE or JOIN clauses.key: This is the final choice. MySQL picks the index with the "lowest cost" (usually the one that eliminates the most rows the fastest).Why would they differ? > If
possible_keysshows an index butkeyisNULL, MySQL decided itβs actually faster to scan the whole table (type:ALL) than to use the indexβthis often happens on very small tables or if you are fetching >20% of the total rows.
If the table column looks like a weird tag, MySQL is working with internal data:
<derivedN>: A temporary table generated from a Subquery in your FROM clause. The N refers to the id line in the output where that subquery was defined.<unionM,N>: The internal result of a UNION. MySQL is merging results from lines M and N and performing deduplication.<subqueryN>: A subquery that has been optimized into a specialized lookup table for efficiency.Your goal is to move up this list.
system / const: MySQL finds exactly one row instantly. (Searching by Primary Key).eq_ref: The "Golden Ticket" for joins. A 1-to-1 match using a Unique/Primary key.ref: Using a standard index to find multiple rows.
range: Using an index to find a slice of data (BETWEEN, IN(), >, <).index: The Full Index Scan. MySQL reads the entire index file. Better than a table scan, but still a lot of reading.ALL: The Full Table Scan. MySQL reads every single byte of the table from disk.
WHERE clause.Tip
When is ALL okay?
If the table column shows <derivedN> or <unionM,N> and the "rows" count is very small (e.g., under 100), an ALL scan is perfectly fine. It just means MySQL is reading a small temporary result set it just created in memory.
This column reveals the "extra effort" MySQL has to put in.
| Flag | Impact | Meaning |
|---|---|---|
Using index |
β Good | Covering Index: MySQL got all data from the index itself without touching table rows. |
Using index condition |
β Good | Index Condition Pushdown (ICP): MySQL filtered rows at the storage engine level using the index before even reading the full table rows. (Saves I/O). |
Using index for group-by |
β Good | "Loose Index Scan." MySQL jumped between unique index values (very fast). |
Using where |
MySQL filters rows after fetching them. Watch out if type is ALL. |
|
Using intersect(...) |
Index Merge: MySQL is using two separate indexes and finding the overlap. Better than a scan, but usually slower than a single Composite Index. | |
Using filesort |
π¨ Bad | MySQL had to manually sort data because the index order didn't match your ORDER BY. |
Using temporary |
π¨ Bad | An internal table was built in memory to process the result (Common with UNION or DISTINCT). |
Dependent Subquery |
π Very Bad | The subquery runs once for every row in the outer query. This is a performance killer. |
Note
ICP Explanation:
Normally, the process of fetching data has two steps:
Using index condition means MySQL "pushed" the filtering logic down to the Storage Engine. Instead of handing back every row that might match, the Storage Engine checks the conditions against the index fields first and only hands back the rows that actually match.Example: You have an index on (last_name, birth_year).
Your query is: WHERE last_name LIKE 'Smi%' AND birth_year > 1990.\
Without ICP, MySQL finds all "Smi%" entries and reads the full row for every single one from the disk just to check the year. With Using index condition, MySQL checks the birth_year while it's still looking at the index. It only goes to the disk for the "Smi" people actually born after 1990.
Even with an index, these mistakes force a slow type: ALL scan:
LIKE '%smith' (Slow). LIKE 'smith%' (Fast).WHERE YEAR(date) = 2026 (Slow). Use WHERE date >= '2026-01-01' (Fast).VARCHAR column with a number (e.g., WHERE string_id = 123). MySQL has to convert every row to a number to compare them.If you frequently query WHERE status = 'active' AND created_at > '...', a single index on (status, created_at) is vastly superior to two separate indexes.
(A, B, C), it can be used for queries on (A), (A, B), or (A, B, C). It cannot be used for a query only on (B) or (C).A practical guide to how database indexes function under the hood, how they impact performance, and advanced index types.
INSERT, UPDATE, or DELETE requires updating the index and incurs additional storage costs.Note
The order of columns inside your WHERE clause does not matter because the query optimizer will reorder them to match the index; however, the necessary columns must be present.
-- Composite index definition
CREATE INDEX idx_users_name_country_state ON users(last_name, country, state);
-- BAD: Does NOT use the index (skips the leftmost column entirely)
SELECT * FROM users WHERE country = 'UK';
-- GOOD: Uses the full index (Optimizer reorders the WHERE clause automatically)
SELECT * FROM users WHERE state = 'California' AND country = 'US' AND last_name = 'Smith';
-- PARTIAL / INDEX GAP: Uses the index for `last_name` only.
-- Because `country` is missing, it cannot use the index to look up `state`.
-- The database must manually scan all 'Smith' rows to filter for 'California'.
SELECT * FROM users WHERE last_name = 'Smith' AND state = 'California';=, IN) before range columns (>, <, BETWEEN, LIKE). A database can only evaluate one range condition per index lookup; any index columns specified to the right of a range condition are ignored.-- BAD:
-- INEFFICIENT COMPOSITE INDEX DESIGN: Range column placed first
CREATE INDEX idx_users_created_country ON users(created_at, country);
-- POOR PERFORMANCE: The `>=` range condition halts index traversal.
-- The database uses the index for the date range, but must manually
-- scan every matched row to filter by country.
SELECT * FROM users WHERE country = 'UK' AND created_at >= '2026-01-01';
-- GOOD:
-- Composite index definition (Equality first, Range last)
CREATE INDEX idx_users_country_created ON users(country, created_at);
-- OPTIMAL: Uses both columns of the index because equality comes first.
SELECT * FROM users WHERE country = 'UK' AND created_at >= '2026-01-01';WHERE LOWER(name) = 'pokemon') breaks standard index usage and triggers a full table scan.
-- BAD: Standard index on `name` is ignored here
SELECT * FROM users WHERE LOWER(name) = 'pokemon';
-- GOOD: Create an expression/functional index
CREATE INDEX idx_users_lower_name ON users(LOWER(name));WHERE clause. Perfect for optimizing queries on low-cardinality data or specific filtered subsets, saving significant storage.-- Index only active accounts (saves space if most users are inactive)
CREATE INDEX idx_active_users ON users(id) WHERE status = 'active';
-- This query utilizes the partial index
SELECT id FROM users WHERE status = 'active' AND id = 42;INCLUDE clause to append non-key payload data directly to the index. This allows the database to fetch the required data purely from the index without reading the underlying table heap.
VACUUM has recently run to map the pages as 100% visible.-- Index by email, but attach the username to the payload
CREATE INDEX idx_users_email_include_username ON users(email) INCLUDE (username);
-- Index Only Scan: The database doesn't need to look at the main table at all
SELECT email, username FROM users WHERE email = 'alex@example.com';EXPLAIN and EXPLAIN ANALYZE to accurately verify whether your database engine is utilizing your indexes as intended.-- Shows the execution plan and execution statistics
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'alex@example.com';In MySQL, a Virtual Column (specifically known as a Generated Column) is a column whose value is automatically calculated from an expression or other columns in the same table, rather than being explicitly inserted or updated by a user.
Think of it like a formula in an Excel spreadsheet: you define the rule once, and the database ensures the data stays in sync.
Instead of writing complex logic in every SELECT statement, you bake the logic into the table schema.
first_name and last_name, you can create a virtual full_name column. You'll never have to manually concatenate them in your code again.This is the "killer feature" of Generated Columns. You cannot directly index the result of a function in a WHERE clause in older versions of MySQL.
WHERE JSON_EXTRACT(data, "$.id") = 10 is slow because it requires a full table scan.Because the database handles the calculation, there is zero risk of the generated value getting "out of sync" with the source data. If the price or tax_rate changes, the total_price column updates automatically and atomically.
You can use Generated Columns to enforce business logic. For example, you could create a generated column that returns TRUE if a complex set of conditions is met, and then place a UNIQUE index on it to prevent invalid data combinations.
MySQL offers two "flavors" of these columns, and the distinction is mostly about storage space versus CPU performance:
| Type | Stored on Disk? | Performance Impact |
|---|---|---|
| VIRTUAL | No | Calculated on-the-fly when the row is read. Uses no disk space but costs a bit of CPU during SELECT. |
| STORED | Yes | Calculated when the row is written/updated. Uses disk space but is faster to read (like a normal column). |
When creating Generated Columns, the choice between VIRTUAL and STORED comes down to a fundamental architectural trade-off: Disk Space vs. CPU Performance.
| Type | Stored on Disk? | Calculation Trigger | Best For... |
|---|---|---|---|
VIRTUAL |
β No | Calculated on-the-fly during a SELECT query. |
Simple calculations and saving storage space. |
STORED |
Yes | Calculated once during an INSERT or UPDATE. |
Complex logic, high-read tables, and sorting optimization. |
Use VIRTUAL as your default choice for lightweight, straightforward operations.
CONCAT(first_name, ' ', last_name)) or basic math where the CPU overhead to compute the value on-the-fly is completely negligible.VIRTUAL prevents write amplification because the database doesn't have to re-compute and re-write the generated data on every single update.Use STORED when data retrieval speed is your absolute highest priority, or when dealing with complex data transformations.
SELECT query will cripple performance. STORED moves that CPU tax to the write phase so reads remain instant.INSERT means every subsequent SELECT reads a pre-computed static value directly from disk.ORDER BY and Aggregations: If you frequently sort by the generated value (e.g., ORDER BY path_pattern_length DESC), a STORED column provides a static, physical property that the database can scan cleanly.Note
While modern MySQL allows you to index VIRTUAL columns (which creates a hidden physical index structure anyway), utilizing a STORED column ensures that the computed data physically resides within the main table row, making it a highly reliable and bulletproof strategy for turning un-indexable function results into static, searchable data.
Here is a breakdown of the syntax:
ALTER TABLE products -- Tell the database we are modifying the structure of the existing products table.
ADD COLUMN discount_price DECIMAL(10,2) -- Create a new column named discount_price.
GENERATED ALWAYS AS (original_price * 0.9) -- Don't let users input data here; calculate it using this rule instead.
STORED; -- This defines how the data is kept.original_price is updated.In routing systems (URL matching), you often want the "most specific" match to win. Since a longer pattern is usually more specific than a shorter one. The following example shows how an index can allow the database to instantly find the longest applicable pattern for a specific version_id and match_type.
You'll see we are using a Generated Column (path_pattern_length). This is necessary because of a fundamental rule in database indexing: You cannot directly index the result of a function unless you use a Generated Column (or a Functional Index); and by indexing this in descending order, we are trying to prioritize longer path patterns first.
This is useful for when you frequently run queries like ORDER BY path_pattern_length DESC LIMIT 10. By having the index already sorted in descending order allows the database to read the values directly from the "start" of the index without having to reverse-scan.
ALTER TABLE path_groups
ADD COLUMN path_pattern_length INT
GENERATED ALWAYS AS (LENGTH(path_pattern)) STORED;
CREATE INDEX idx_version_matchtype_length
ON path_groups (version_id, match_type, path_pattern_length DESC);It eliminates the need for a "Sort" operation in the execution plan if your query asks for the longest paths first.
It allows the database to find the "Maximum" value for that column (within a specific version/match type) in O(1) or O(log n) time.
If we simply created an index on path_pattern, the database would store the strings (e.g., /user/*, /admin/settings/logs). If we then ran a query asking for the length of those strings, the database would have to:
This is a "Full Table Scan," which is extremely slow as your data grows. By creating a generated column, you give the database a static value that it can physically put into an index.