Skip to content

Instantly share code, notes, and snippets.

@stephanGarland
Created February 7, 2025 17:37
Show Gist options
  • Save stephanGarland/1c94fce381ea46f563eb4c0f43afb145 to your computer and use it in GitHub Desktop.
Save stephanGarland/1c94fce381ea46f563eb4c0f43afb145 to your computer and use it in GitHub Desktop.
Comparing VARCHAR, ENUM, and JSON indexed performance in MySQL 5.7

Environment

System

  • Macbook Pro M3
  • RAM: 36 GiB
  • SSD: 500 GB

MySQL

  • MySQL 5.7, native (non-Docker) installation
  • Buffer Pool: 9 GiB

Schemata

mysql> SHOW CREATE TABLE CustomerFlat\G
*************************** 1. row ***************************
       Table: CustomerFlat
Create Table: CREATE TABLE `CustomerFlat` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `fname` varchar(255) NOT NULL,
  `lname` varchar(255) DEFAULT NULL,
  `email` varchar(254) NOT NULL,
  `ssn` char(11) NOT NULL,
  `uuidv7` char(36) NOT NULL,
  `created_at` datetime NOT NULL,
  `city` varchar(255) NOT NULL,
  `state` varchar(255) NOT NULL,
  `zip` char(5) NOT NULL,
  `phone` char(12) NOT NULL,
  `status` enum('ACTIVE','INACTIVE','PENDING','SUSPENDED','CLOSED','BLOCKED','EXPIRED','VERIFIED','DELETED','UNDER REVIEW') NOT NULL,
  `status_account` enum('ACTIVE','INACTIVE','PENDING','SUSPENDED','CLOSED','VERIFIED','LOCKED','FRAUDULENT','DISABLED','TERMINATED') NOT NULL,
  `status_card` enum('ACTIVE','INACTIVE','PENDING','BLOCKED','EXPIRED','CANCELED','ISSUED','SUSPENDED','UNDER REVIEW','REPORTED LOST') NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `CustomerFlat_uuidv7_unq` (`uuidv7`),
  KEY `CustomerFlat_status_idx` (`status`),
  KEY `CustomerFlat_status_account_idx` (`status_account`),
  KEY `CustomerFlat_status_card_idx` (`status_card`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE CustomerFlatString\G
*************************** 1. row ***************************
       Table: CustomerFlatString
Create Table: CREATE TABLE `CustomerFlatString` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `fname` varchar(255) NOT NULL,
  `lname` varchar(255) DEFAULT NULL,
  `email` varchar(254) NOT NULL,
  `ssn` char(11) NOT NULL,
  `uuidv7` char(36) NOT NULL,
  `created_at` datetime NOT NULL,
  `city` varchar(255) NOT NULL,
  `state` varchar(255) NOT NULL,
  `zip` char(5) NOT NULL,
  `phone` char(12) NOT NULL,
  `status` varchar(32) NOT NULL,
  `status_account` varchar(32) NOT NULL,
  `status_card` varchar(32) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `CustomerFlatString_uuidv7_unq` (`uuidv7`),
  KEY `CustomerFlatString_status_card_idx` (`status_card`),
  KEY `CustomerFlatString_status_account_idx` (`status_account`),
  KEY `CustomerFlatString_status_idx` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE CustomerJson\G
*************************** 1. row ***************************
       Table: CustomerJson
Create Table: CREATE TABLE `CustomerJson` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `fname` varchar(255) NOT NULL,
  `lname` varchar(255) DEFAULT NULL,
  `email` varchar(254) NOT NULL,
  `ssn` char(11) NOT NULL,
  `uuidv7` char(36) NOT NULL,
  `created_at` datetime NOT NULL,
  `city` varchar(255) NOT NULL,
  `state` varchar(255) NOT NULL,
  `zip` char(5) NOT NULL,
  `phone` char(12) NOT NULL,
  `metadata` json NOT NULL,
  `status` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$.status'))) VIRTUAL NOT NULL,
  `status_account` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$.status_account'))) VIRTUAL NOT NULL,
  `status_card` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$.status_card'))) VIRTUAL NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `CustomerFlat_uuidv7_unq` (`uuidv7`),
  KEY `CustomerJson_status_idx` (`status`),
  KEY `CustomerJson_status_account_idx` (`status_account`),
  KEY `CustomerJson_status_card_idx` (`status_card`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

Experiment

Overview

Three tables were created with similar schemata, with one difference: the storage of various status columns.

  • CustomerFlat stores the three status columns as an ENUM.
  • CustomerFlatString stores the three status columns as VARCHAR(32).
  • CustomerJson stores the three status columns as a JSON object.

Each table has an INTEGER AUTO_INCREMENT PRIMARY KEY id column, a UNIQUE index on uuidv7, and a non-unique index on each status column. For CustomerJson, this is accomplished via the use of GENERATED ALWAYS AS... VIRTUAL columns where the values are extracted from the JSON object; these virtual scalar columns are then indexed (the index itself is of course stored).

Each table has 10,000,000 rows. Sizes in GiB are below, as well as the performance for a simple SELECT COUNT(*) with predicates on two status columns.

Finally, both OPTIMIZE TABLE and ANALYZE TABLE were ran after loading and index creation to ensure a fair comparison.

Size

A stored procedure was created to query INFORMATION_SCHEMA.TABLES for data size (consisting of data and the PK), as well as secondary index size.

mysql> CALL sizeof(DATABASE(), 'CustomerFlat');
+-------------------------------------------+
| result                                    |
+-------------------------------------------+
| Data Size: 1.83 GiB, Index Size: 0.84 GiB |
+-------------------------------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> CALL sizeof(DATABASE(), 'CustomerFlatString');
+-------------------------------------------+
| result                                    |
+-------------------------------------------+
| Data Size: 2.11 GiB, Index Size: 1.10 GiB |
+-------------------------------------------+
1 row in set (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> CALL sizeof(DATABASE(), 'CustomerJson');
+-------------------------------------------+
| result                                    |
+-------------------------------------------+
| Data Size: 2.77 GiB, Index Size: 1.10 GiB |
+-------------------------------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

As expected, CustomerFlatString and CustomerJson have identical secondary index sizes, since they're both indexing exactly the same values. CustomerFlat's indices are 0.26 GiB smaller due to the status columns as ENUM. Additionally, the data in CustomerFlat is 0.28 GiB smaller, for the same reason. Meanwhile, CustomerJson's data size is 0.94 GiB - nearly 1 GiB - larger than the baseline, for a total size difference of 1.2 GiB. On a per-row basis, this equates to ~128 bytes additional size per row over the baseline.

Performance

All tables were thoroughly queried first to ensure as many of the pages were loaded into the buffer pool; additionally, the queries were run multiple times, with the best times for each being shown here.

mysql> SELECT COUNT(*) FROM CustomerFlat WHERE status = 'ACTIVE' AND status_account = 'VERIFIED' AND status_card = 'ISSUED';
+----------+
| COUNT(*) |
+----------+
|     9976 |
+----------+
1 row in set (0.31 sec)

mysql> SELECT COUNT(*) FROM CustomerFlatString WHERE status = 'ACTIVE' AND status_account = 'VERIFIED' AND status_card = 'ISSUED';
+----------+
| COUNT(*) |
+----------+
|     9976 |
+----------+
1 row in set (0.39 sec)

mysql> SELECT COUNT(*) FROM CustomerJson WHERE status = 'ACTIVE' AND status_account = 'VERIFIED' AND status_card = 'ISSUED';
+----------+
| COUNT(*) |
+----------+
|     9976 |
+----------+
1 row in set (1.23 sec)

Checking EXPLAIN for each shows that all expected indices were considered. For CustomerFlat and CustomerFlatString, an index merge is suggested, whereas for CustomerJson, it only used the index on status_account. Following the actual query performance with optimizer_trace shows that it doesn't even consider trying to intersect indices. I'm not sure why, as I don't see that called out as a restriction in docs.

An index was then added on created_at for each table, and a simple query checking all those created from 2024 - now was ran (SELECT COUNT(*) FROM CustomerFlat WHERE created_at >= '2024-01-01 00:00:00'). The result was nearly identical, with CustomerFlat taking 306.62 msec, CustomerFlatString taking 309.87 msec, and CustomerJson taking 316.70 msec. This difference can likely be assigned to the larger row sizes of each table over the baseline. Even though JSON blobs are stored off-page, they have a 20-byte pointer to the overflow page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment