Skip to content

Instantly share code, notes, and snippets.

@misostack
Last active January 16, 2024 20:59
Show Gist options
  • Select an option

  • Save misostack/1776ae7fe98c6bf7950e362159207870 to your computer and use it in GitHub Desktop.

Select an option

Save misostack/1776ae7fe98c6bf7950e362159207870 to your computer and use it in GitHub Desktop.
100 days mysql

100 days MYSQL

MYSQL High Performance Day 1

Overview Architecture

image

Built-in functions (e.g., dates, times, math, and encryption). Any functionality provided across storage engines lives at this level: stored procedures, triggers, and views

Storage engines are responsible for storing and retrieving all data stored “in” MySQL. each storage engine has its own benefits and drawbacks

image

image

That's why DB Server can be in a problem if there are too many connections.

Performance Optimization

Connection Pool

image

image

Optimization & Execution

MySQL parses queries to create an internal structure (the parse tree), and then applies a variety of optimizations. These can include rewriting the query, determining the order in which it will read tables, choosing which indexes to use

  • Queries
  • Schema
  • Settings
  • DB Engines
  • Index Types
  • Query Cache
  • Parse Query

Concurrency Control

MySQL has to do this at two levels: the server level and the storage engine level

Story: **Mailbox**
What happens when two processes try to deliver messages at the same time to the same mailbox? 
Well-behaved mail delivery systems use locking to prevent corruption. If a client attempts a second delivery while the mailbox is locked, it must wait to acquire the lock itself before delivering its message. 

This scheme works reasonably well in practice, but it gives no support for concurrency.
Because only a single process can change the mailbox at any given time, this approach becomes problematic with a high-volume mailbox.

Read/Write Locks

Reading from the mailbox isn’t as troublesome. There’s nothing wrong with multiple
clients reading the same mailbox simultaneously; because they aren’t making changes,
nothing is likely to go wrong. But what happens if someone tries to delete message
number 25 while programs are reading the mailbox

Systems that deal with concurrent read/write access typically implement a locking system that consists of two lock types. These locks are usually known as shared locks and exclusive locks, or read locks and write locks

Read locks

Locks on a resource are shared, or mutually nonblocking: many clients can read from a resource at the same time and not interfere with each other

Write locks

On the other hand, are exclusive—i.e., they block both read locks and other write locks—because the only safe policy is to have a single client writing to the resource at a given time and to prevent all reads when a client is writing

MySQL has to prevent one client from reading a piece of data while another is changing it. This happens internally.

Lock Granularity

One way to improve the concurrency of a shared resource is to be more selective about what you lock. Rather than locking the entire resource, lock only the part that contains the data you need to change

If the system spends too much time managing locks instead of storing and retrieving data, perfor�mance can suffer.

A locking strategy is a compromise between lock overhead and data safety, and that compromise affects performance

Table locks

  • It locks the entire table
  • When a client wishes to write to a table (insert, delete, update, etc.), it acquires a write lock. This keeps all other read and write operations at bay.
  • Write locks also have a higher priority than read locks, so a request for a write lock will advance to the front of the lock queue even if readers are already in the queue.
  • Write locks can advance past read locks in the queue, but read locks cannot advance past write locks).
  • Storage engines can manage their own locks

For instance, the server uses a table-level lock for statements such as ALTER TABLE, regardless of the storage engine.

Row locks

  • The locking style that offers the greatest concurrency (and carries the greatest overhead) is the use of row locks
  • Row-level locking, as this strategy is commonly known, is available in the InnoDB and XtraDB storage engines, among others.
  • Row locks are implemented in the storage engine, not the server.
  • The server is completely unaware of locks implemented in the storage engines

Transactions

  • A transaction is a group of SQL queries that are treated atomically, as a single unit of work

It’s all or nothing

1 START TRANSACTION;
2 SELECT balance FROM checking WHERE customer_id = 10233276;
3 UPDATE checking SET balance = balance - 200.00 WHERE customer_id = 10233276;
4 UPDATE savings SET balance = balance + 200.00 WHERE customer_id = 10233276;
5 COMMIT;

But transactions alone aren’t the whole story. What happens if the database server crashes while performing line 4? Who knows? The customer probably just lost $200. And what if another process comes along between lines 3 and 4 and removes the entire checking account balance? The bank has given the customer a $200 credit without even knowing it.

Transactions aren’t enough unless the system passes the ACID test.

ACID

ACID stands for Atomicity, Consistency, Isolation, and Durability.

Atomicity

A transaction must function as a single indivisible unit of work so that the entire transaction is either applied or rolled back. When transactions are atomic, there is no such thing as a partially completed transaction: it’s all or nothing

Consistency

The database should always move from one consistent state to the next. In our example, consistency ensures that a crash between lines 3 and 4 doesn’t result in $200 disappearing from the checking account. Because the transaction is never committed, none of the transaction’s changes are ever reflected in the database

Isolation

The results of a transaction are usually invisible to other transactions until the transaction is complete. This ensures that if a bank account summary runs after line 3 but before line 4 in our example, it will still see the $200 in the checking account. When we discuss isolation levels, you’ll understand why we said usually invisible

Durability

Once committed, a transaction’s changes are permanent. This means the changes must be recorded such that data won’t be lost in a system crash. Durability is a slightly fuzzy concept, however, because there are actually many levels. Some du�rability strategies provide a stronger safety guarantee than others, and nothing is ever 100% durable (if the database itself were truly durable, then how could back�ups increase durability?). We discuss what durability really means in MySQL in later chapters.

A database server with ACID transactions also generally requires more CPU power, memory, and disk space than one without them

Isolation Levels

Lower isolation levels typically allow higher concurrency and have lower overhead.

image

  • READ UNCOMMITTED - break isolation
  • READ COMMITTED ( PostgreSQL ) - solve isolation issue but have non-repeatable read issue
  • REPEATABLE READ ( MYSQL ) - solve non-repeatable read have phantom issue
  • SERIALIZE - solve phantom but timeout issue occurs

InnoDB and XtraDB solve the phantom read problem with multiversion concurrency control

Non Repeatable Read

This means you can run the same statement twice and see different data in a transaction :))

Phantom Reads

A phantom read can happen when you select some range of rows, another transaction inserts a new row into the range, and then you select the same range again; you will then see the new “phantom” row.

image

Deadlocks

A deadlock is when two or more transactions are mutually holding and requesting locks on the same resources, creating a cycle of dependencies.

# trans 1
START TRANSACTION;
UPDATE StockPrice SET close = 45.50 WHERE stock_id = 4 and date = '2002-05-01';
UPDATE StockPrice SET close = 19.80 WHERE stock_id = 3 and date = '2002-05-02';
COMMIT;
# trans 2
START TRANSACTION;
UPDATE StockPrice SET high = 20.12 WHERE stock_id = 3 and date = '2002-05-02';
UPDATE StockPrice SET high = 47.20 WHERE stock_id = 4 and date = '2002-05-01';
COMMIT;

If you’re lucky enough, each transaction will execute its first query and update a row of data, locking it in the process. Each transaction will then attempt to update its second row, only to find that it is already locked. The two transactions will wait forever for each other to complete, unless something intervenes to break the deadlock.

InnoDB storage engine, will notice circular dependencies and return an error instantly The way InnoDB currently handles deadlocks is to roll back the transaction that has the fewest exclusive row locks.

Lock behavior and order are storage engine–specific, so some storage engines might deadlock on a certain sequence of statements even though others won’t

Deadlocks have a dual nature: some are unavoidable because of true data conflicts, and some are caused by how a storage engine works

Deadlocks cannot be broken without rolling back one of the transactions, either par�tially or wholly. They are a fact of life in transactional systems, and your applications should be designed to handle them. Many applications can simply retry their transac�tions from the beginning.

Transaction Logging

image

Transactions in MySQL

MySQL provides two transactional storage engines: InnoDB and NDB Cluster. Several third-party engines are also available; the best-known engines right now are XtraDB and PBXT. We discuss some specific properties of each engine in the next section

AUTOCOMMIT

image

mysql> SHOW VARIABLES LIKE 'AUTOCOMMIT';
mysql> SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED

Mixing storage engines in transactions

If you mix transactional and nontransactional tables (for instance, InnoDB and MyISAM tables) in a transaction, the transaction will work properly if all goes well.

However, if a rollback is required, the changes to the nontransactional table can’t be undone

Implicit and explicit locking

The locking mechanisms described earlier are all implicit. InnoDB handles locks automatically, according to your isolation level.

InnoDB uses a two-phase locking protocol. It can acquire locks at any time during a transaction, but it does not release them until a COMMIT or ROLLBACK.

InnoDB also supports explicit locking, which the SQL standard does not mention at all

START TRANSACTION;
SELECT ... LOCK IN SHARE MODE
SELECT ... FOR UPDATE
COMMIT;

MySQL also supports the LOCK TABLES and UNLOCK TABLES commands, which are im�plemented in the server, not in the storage engines. These have their uses, but they are not a substitute for transactions.

You should never use LOCK TABLES unless you are in a trans�action and AUTOCOMMIT is disabled, no matter what storage engine you are using

Multiversion Concurrency Control (MVCC)

MVCC is not unique to MySQL: Oracle, PostgreSQL, and some other database systems use it too

MVCC works by keeping a snapshot of the data as it existed at some point in time.

This means transactions can see a consistent view of the data, no matter how long they run. It also means different transactions can see different data in the same tables at the same time.

Each storage engine implements MVCC differently.Some of the variations include optimistic and pessimistic concurrency control.

image

image

The result of all this extra record keeping is that most read queries never acquire locks. They simply read data as fast as they can, making sure to select only rows that meet the criteria. The drawbacks are that the storage engine has to store more data with each row, do more work when examining rows, and handle some additional housekeeping operations.

image

Eg: PostgreSQL ( READ COMMITTED ), MYSQL (INNODB - REPEATABLE READ )

 SHOW TABLE STATUS LIKE 'customers' \G

image

InnoDB tablespace

The physical storage structure of InnoDB

image

The logic storage structure of InnoDB

image

  • Defaults to the REPEATABLE READ isolation level, and it has a next-key locking strategy that prevents phantom reads in this isolation level: rather than locking only the rows you’ve touched in a query, InnoDB locks gaps in the index struc�ture as well, preventing phantoms from being inserted
  • InnoDB tables are built on a clustered index
  • Provides very fast primary key lookups
  • Secondary indexes( indexes that aren't the primary key) contain primary key columns => so if primary key is large, other indexes will also be larged
  • Platform neutral: can copy the data and index files from an Intel-based server to PowerPC or Sun SPARC without any trouble

You should strive for small primary key if you will have many indexes on a table

Summary

  • Transactions?
  • Backups ( hot backup )
  • Crash recover
  • Special features

Usecase

  • Logging? Which kind of DB Engines? Can be Archive/MyISAM
  • Read-only or read-mostly tables?
  • Order processing => InnoDB
  • Bulletin boards and threaded discussion forums =>
  • Large data volumes
  • Many InnoDB databases in the 3 TB to 5 TB range, or even large

Table Conversions

  • The easiest way to move a table from one engine to another is with an ALTER TABLE statement. The following command converts mytable to InnoDB
ALTER TABLE mytable ENGINE = InnoDB;

This syntax works for all storage engines, but there’s a catch: it can take a lot of time. MySQL will perform a row-by-row copy of your old table into a new table. During that time, you’ll probably be using all of the server’s disk I/O capacity, and the original table will be read-locked while the conversion runs

  • Dump and import
  • CREATE and SELECT
CREATE TABLE innodb_table LIKE myisam_table;
ALTER TABLE innodb_table ENGINE=InnoDB;
INSERT INTO innodb_table SELECT * FROM myisam_table;
# which large data
START TRANSACTION;
INSERT INTO innodb_table SELECT * FROM myisam_table
 -> WHERE id BETWEEN x AND y;
 COMMIT;

MYSQL High Performance Day 2 - Benchmarking MySQL

Why?

  • Validate your assumptions about the system, and see whether your assumptions are realistic
  • Reproduce a bad behavior you’re trying to eliminate in the system
  • Measure how your application currently performs
  • Simulate a higher load than your production systems handle, to identify the scal�ability bottleneck that you’ll encounter first with growth
  • Plan for growth. Benchmarks can help you estimate how much hardware, network capacity, and other resources you’ll need for your projected future load
  • Test your application’s ability to tolerate a changing environment
  • Test different hardware, software, and operating system configurations. Is RAID 5 or RAID 10 better for your system
  • Prove that your newly purchased hardware is correctly configured
  • You can also use benchmarks for other purposes, such as to create a unit test suite for your application

Benchmarking Strategies

There are two primary benchmarking strategies: you can benchmark the application as a whole, or isolate MySQL

Fullstack vs Single Component Benchmarking

  • You’re testing the entire application, including the web server, the application code, the network, and the database. This is useful because you don’t care about MySQL’s performance in particular; you care about the whole application
  • MySQL is not always the application bottleneck, and a full-stack benchmark can reveal this
  • MySQL is not always the application bottleneck, and a full-stack benchmark can reveal this
  • Benchmarks are good only to the extent that they reflect your actual application’s behavior, which is hard to do when you’re testing only part of it

MySQL benchmark:

  • You want to compare different schemas or queries.
  • You want to benchmark a specific problem you see in the application
  • You want to avoid a long benchmark in favor of a shorter one that gives you a faster “cycle time” for making and measuring changes

What to Measure

  • Throughput: number of transactions per unit of time. Tools : https://www.tpc.org/ - OLTP - suitable for interactive multiuser applications. The usual unit is transactions per second, although it is sometimes transactions per minute
  • Response time or latency: This measures the total time a task requires, t’s common to use percentile response times instead
  • Concurrency: the working concurrency, or the number of threads or connections doing work simultaneously
  • Scalability: measurements are useful for systems that need to maintain performance under a changing workload. System should get twice as much work done (twice as much throughput) when you double the number of workers trying to complete tasks.

Eg: Concurrency

A website with “50,000 users at a time” might require only 10 or 15 simultaneously running quer�ies on the MySQL server!

Concurrency is completely different from other metrics such as response time and throughput: it’s usually not an outcome, but rather a property of how you set up the benchmark

Benchmarking Tactics

Common mistakes

  • Using a subset of the real data size, such as using only one gigabyte of data when the application will need to handle hundreds of gigabytes, or using the current dataset when you plan for the application to grow much larger
  • Using incorrectly distributed data, such as uniformly distributed data when the real system’s data will have “hot spots.” (Randomly generated data is almost always unrealistically distributed.)
  • Using unrealistically distributed parameters, such as pretending that all user pro�files are equally likely to be viewed.
  • Using a single-user scenario for a multiuser application
  • Benchmarking a distributed application on a single server.
  • Failing to match real user behavior, such as “think time” on a web page. Real users request a page and then read it; they don’t click on links one after another without pausing.
  • Running identical queries in a loop. Real queries aren’t identical, so they cause cache misses. Identical queries will be fully or partially cached at some level.
  • Failing to check for errors. If a benchmark’s results don’t make sense—e.g., if a slow operation suddenly completes very quickly—check for errors.Always check error logs after benchmarks, as a matter of principle.
  • Ignoring how the system performs when it’s not warmed up, such as right after a restart
  • Sometimes you need to know how long it’ll take your server to reach capacity after restart
  • Using default server settings
  • Benchmarking too quickly. Your benchmark needs to last a while

Designing and Planning a Benchmark

Tools

Fullstack tools

image

Single-Component Tools

MYSQL High Performance Day 3 - Profiling Server Performance

Performance is response time

Optimization Through Profiling

Profiling is the primary means of measuring and analyzing where time is consumed. Profiling entails two steps: measuring tasks and the time elapsed, and aggregating and sorting the results so that the important tasks bubble to the top

Report

image

Tools:

Interpreting the Profile

  • Worthwhile queries

A query that consumes only 5% of total response time can contribute only 5% to overall speedup

  • Outliers : Tasks might need to be optimized even if they don’t sort to the top of the profile

If an occasional task is very slow, it might be unacceptable to users, even though it doesn’t happen often enough to constitute a significant portion of overall re�sponse time

  • Unknown unknowns: A good profiling tool will show you the “lost time,” if possible

Lost time is the amount of wall-clock time not accounted in the tasks measured

  • Buried details: Averages are dangerous because they hide information from you, and the average isn’t a good indication of the whole. So it's better if we can have a histograms, percentiles, standard deviation, and the index of dispersion

Profiling Your Application

MYSQL High Performance Day 4 - Optimizing Schema and Data Types

The computer's resources that you need to optimize: space on disk, in memory, CPU cache, CPU cycles to process

1. Smaller is usually better

  • Smaller data types are usually better

2. Simple is good

  • Fewer CPU cycles are typically required to process operations on simpler data types.
  • Eg: Integers are cheaper to compare than characters, because character sets and collations ( sorting rules ) make character comparions complicated.
  • For instance: you should store dates and times in MYSQL's built-in types instead of strings, integers for IP addresses

3. Avoid NULL if possible

  • A lot of tables include nullable columns even the application does not need to store NULL ( the absent of value ),merely because it's the default. It's usually best to specify columns as NOT NULL unless you intend to store NULL in them
  • It's harder for MYSQL to optimize queries that refer to nullable columns, because they make indexes, index statistics, value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MYSQL.
  • When nullable column is indexed, it requires an extra byte per entry and can even cause a fixed-size index( such as index on a single integer column) to be converted to variable-sized one in MYISAM
  • The performance improvement from changing NULL columns to NOT NULL is usually small, so don't they make it a priority to find and change them in an existing schema unless you known they are causing problems. However, if you're planning to index columns, avoid making them nullable if possible.
  • There's exception, of course. InnoDB stores NULL with a single bit

Numbers

Integer

  • Smallest one: TINYINT
  • Biggest one: BIGINT

−2(N–1) to 2(N–1)–1

Your choice determines how MySQL stores the data, in memory and on disk. However, integer computations generally use 64-bit BIGINT integers, even on 32-bit architectures.

For storage and computational purposes, INT(1) is identical to INT(20).

Real Numbers

Real numbers are numbers that have a fractional part

  • DECIMAL type is for storing exact fraction numbers, can also store integers that are so large they don't fit in BIGINT

References: https://dev.mysql.com/doc/refman/8.0/en/integer-types.html image

image

image

image

DATE, TIME, DATETIME, TIMESTAMP, and YEAR

The date and time data types for representing temporal values are DATE, TIME, DATETIME, TIMESTAMP, and YEAR

TIMESTAMP and DATETIME columns can be automatically initialized and updated to the current date and time (that is, the current timestamp).

  • An auto-initialized column is set to the current timestamp for inserted rows that specify no value for the column.
  • An auto-updated column is automatically updated to the current timestamp when the value of any other column in the row is changed from its current value. An auto-updated column remains unchanged if all other columns are set to their current values

MySQL retrieves values for a given date or time type in a standard output format, but it attempts to interpret a variety of formats for input values that you supply

https://dev.mysql.com/doc/refman/8.0/en/date-and-time-literals.html

  • MySQL permits you to store a “zero” value of '0000-00-00'
  • “Zero” date or time values used through Connector/ODBC are converted automatically to NULL because ODBC cannot handle such values.

Choosing Optimal Data Types

@misostack
Copy link
Author

misostack commented May 7, 2022

MYSQL High Performance Day 5 - Indexing for High Performance

Indexes (also called “keys” in MySQL) are data structures that storage engines use to find rows quickly

Index Basic

  • Basically, it will work same as index in the book. Where you can find the page number for your expected content.
  • MYSQL performs a lookup on the values in the index and return any rows containing the specified value.
  • An index contains values from one or more columns in a table.
  • If you index more than one column, the column order is very important, because MySQL can only search efficiently on a leftmost prefix of the index

Left Most Prefix of The Index

Creating an index on two columns is not the same as creating two separate single-column indexes

Types of Indexes

There are many types of indexes, each designed to perform well for different purposes. Indexes are implemented in the storage engine layer, not the server layer.

B-Tree indexes

  • InnoDB uses B+Trees.
  • InnoDB leaves values uncompressed in its indexes
  • InnoDB refers to them by their primary key values

image

Examples

We have a table look like this

CREATE TABLE IF NOT EXISTS `phpguru_users`
(
  id BIGINT(20) NOT NULL AUTO_INCREMENT,
  login VARCHAR(60) NOT NULL,
  password VARCHAR(255) NOT NULL,
  email VARCHAR(100) NOT NULL,
  first_name VARCHAR(30) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  type ENUM('admin','member') NOT NULL,
  status ENUM('pending', 'active', 'inactive') NULL DEFAULT 'pending',
  created_at TIMESTAMP NOT NULL,
  updated_at TIMESTAMP NOT NULL,
  is_super_admin BOOLEAN NULL DEFAULT 0,
  PRIMARY KEY (id)
);

image

image
image

It tooks ~ 0.5s if we have 1M records

Another Example

image
image

With Non-Clustered Indexes

image
image

@misostack
Copy link
Author

misostack commented May 8, 2022

B-TREE

Match the full value

A match on the full key value specifies values for all columns in the index

image
image

Match a leftmost prefix

This uses only the first column in the index

Let's create index for (first_name, last_name)

CREATE INDEX ix_phpguru_users_first_name_last_name ON phpguru_users (first_name, last_name);

image

image

Match a column prefix

This uses only the first column in the index

image
image

Match a range of values

image

Match one part exactly and match a range on another part

image
image

Index-only queries

image
vs
image

index-only scans, which can answer queries from an index alone without any heap access
image

For optimal performance, you might need to create indexes with the same columns in different orders to satisfy your queries

@misostack
Copy link
Author

Hash indexes

A hash index is built on a hash table and is useful only for exact lookups that use every column in the index.

The InnoDB storage engine has a special feature called adaptive hash indexes. When InnoDB notices that some index values are being accessed very frequently, it builds a hash index for them in memory on top of B-Tree indexes

DROP TABLE IF EXISTS shorturl_hash;
CREATE TABLE shorturl_hash (
    id serial primary key,
    short int unsigned NOT null default 0,
    url text not null
) ;
DELIMITER //
CREATE TRIGGER shorturl_hash_ins BEFORE INSERT ON shorturl_hash FOR EACH ROW BEGIN
	SET NEW.short=crc32(NEW.url);
END;
//

CREATE TRIGGER shorturl_hash_upd BEFORE UPDATE ON shorturl_hash FOR EACH ROW BEGIN
	SET NEW.short=crc32(NEW.url);
END;
//
DELIMITER ;

INSERT INTO shorturl_hash (url) VALUES('http://jsguru.net');
CREATE INDEX ix_shorturl_hash_short ON shorturl_hash (short);

EXPLAIN SELECT * FROM headfirst.shorturl_hash where short = CRC32("http://jsguru.net") and url = "http://jsguru.net";

image

@misostack
Copy link
Author

misostack commented May 8, 2022

Spatial (R-Tree) indexes

MyISAM supports spatial indexes, which you can use with partial types such as GEOMETRY

The go-to solution for GIS in an open source RDBMS is PostGIS in PostgreSQL

What is GiS?

Hệ thống thông tin địa lý được hình thành vào những năm 1960 và phát triển rất rộng rãi trong 10 năm lại đây. GIS ngày nay là công cụ trợ giúp quyết định trong nhiều hoạt động kinh tế - xã hội, quốc phòng của nhiều quốc gia trên thế giới

@misostack
Copy link
Author

Full-text indexes

FULLTEXT is a special type of index that finds keywords in the text instead of comparing values directly to the values in the index

It has many subtleties, such as stopwords, stemming and plurals, and Boolean searching

@misostack
Copy link
Author

Benefits of Indexes

  • Indexes reduce the amount of data the server has to examine.
  • Indexes help the server avoid sorting and temporary tables.
  • Indexes turn random I/O into sequential I/O

Is an index the best solution?

An index isn’t always the right tool. At a high level, keep in mind that indexes are most effective when they help the storage engine find rows without adding more work than they avoid

  • For very small tables, it is often more effective to simply read all the rows in the table
  • For medium to large tables, indexes can be very effective
  • For enormous tables, the overhead of indexing, as well as the work required to actually use the indexes, can start to add up.

In such cases (enormous tables), you might need to choose a technique that identifies groups of rows that are interesting to the query, instead of individual rows => partitioning

If you have lots of tables, it can also make sense to create a metadata table to store some characteristics of interest for your queries

For example, if you execute queries that perform aggregations over rows in a multitenant application whose data is partitioned
into many tables, you can record which users of the system are actually stored in each table, thus letting you simply ignore tables that don’t have information about those users

At the scale of terabytes, locating individual rows doesn’t make sense; indexes are replaced by per-block metadata

@misostack
Copy link
Author

misostack commented May 8, 2022

Indexing Strategies for High Performance

Isolating the Column

“Isolating” the column means it should not be part of an expression or be inside a function in the query

For example, here’s a query that can’t use the index on first_name:

EXPLAIN SELECT *  FROM headfirst.phpguru_users where LENGTH(first_name) > 5;

image

Prefix Indexes and Index Selectivity

Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table (#T), and ranges from 1/#T to 1

  • Sometimes you need to index very long character columns, which makes your indexes large and slow, One strategy is to simulate a hash index
  • You can often save space and get good performance by indexing the first few characters instead of the whole value => Prefix indexes
  • A highly selective index is good because it lets MySQL filter out more rows when it looks for matches
  • A unique index has a selectivity of 1, which is as good as it gets

@misostack
Copy link
Author

Joins

image

@misostack
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment