- Macbook Pro M3
- RAM: 36 GiB
- SSD: 500 GB
- MySQL 5.7, native (non-Docker) installation
- Buffer Pool: 9 GiB
mysql> SHOW CREATE TABLE CustomerFlat\G
*************************** 1. row ***************************
Table: CustomerFlat
Create Table: CREATE TABLE `CustomerFlat` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(255) NOT NULL,
`lname` varchar(255) DEFAULT NULL,
`email` varchar(254) NOT NULL,
`ssn` char(11) NOT NULL,
`uuidv7` char(36) NOT NULL,
`created_at` datetime NOT NULL,
`city` varchar(255) NOT NULL,
`state` varchar(255) NOT NULL,
`zip` char(5) NOT NULL,
`phone` char(12) NOT NULL,
`status` enum('ACTIVE','INACTIVE','PENDING','SUSPENDED','CLOSED','BLOCKED','EXPIRED','VERIFIED','DELETED','UNDER REVIEW') NOT NULL,
`status_account` enum('ACTIVE','INACTIVE','PENDING','SUSPENDED','CLOSED','VERIFIED','LOCKED','FRAUDULENT','DISABLED','TERMINATED') NOT NULL,
`status_card` enum('ACTIVE','INACTIVE','PENDING','BLOCKED','EXPIRED','CANCELED','ISSUED','SUSPENDED','UNDER REVIEW','REPORTED LOST') NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `CustomerFlat_uuidv7_unq` (`uuidv7`),
KEY `CustomerFlat_status_idx` (`status`),
KEY `CustomerFlat_status_account_idx` (`status_account`),
KEY `CustomerFlat_status_card_idx` (`status_card`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE CustomerFlatString\G
*************************** 1. row ***************************
Table: CustomerFlatString
Create Table: CREATE TABLE `CustomerFlatString` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(255) NOT NULL,
`lname` varchar(255) DEFAULT NULL,
`email` varchar(254) NOT NULL,
`ssn` char(11) NOT NULL,
`uuidv7` char(36) NOT NULL,
`created_at` datetime NOT NULL,
`city` varchar(255) NOT NULL,
`state` varchar(255) NOT NULL,
`zip` char(5) NOT NULL,
`phone` char(12) NOT NULL,
`status` varchar(32) NOT NULL,
`status_account` varchar(32) NOT NULL,
`status_card` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `CustomerFlatString_uuidv7_unq` (`uuidv7`),
KEY `CustomerFlatString_status_card_idx` (`status_card`),
KEY `CustomerFlatString_status_account_idx` (`status_account`),
KEY `CustomerFlatString_status_idx` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE CustomerJson\G
*************************** 1. row ***************************
Table: CustomerJson
Create Table: CREATE TABLE `CustomerJson` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(255) NOT NULL,
`lname` varchar(255) DEFAULT NULL,
`email` varchar(254) NOT NULL,
`ssn` char(11) NOT NULL,
`uuidv7` char(36) NOT NULL,
`created_at` datetime NOT NULL,
`city` varchar(255) NOT NULL,
`state` varchar(255) NOT NULL,
`zip` char(5) NOT NULL,
`phone` char(12) NOT NULL,
`metadata` json NOT NULL,
`status` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$.status'))) VIRTUAL NOT NULL,
`status_account` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$.status_account'))) VIRTUAL NOT NULL,
`status_card` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`metadata`,'$.status_card'))) VIRTUAL NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `CustomerFlat_uuidv7_unq` (`uuidv7`),
KEY `CustomerJson_status_idx` (`status`),
KEY `CustomerJson_status_account_idx` (`status_account`),
KEY `CustomerJson_status_card_idx` (`status_card`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
Three tables were created with similar schemata, with one difference: the storage of various status columns.
CustomerFlat
stores the three status columns as anENUM
.CustomerFlatString
stores the three status columns asVARCHAR(32)
.CustomerJson
stores the three status columns as aJSON
object.
Each table has an INTEGER AUTO_INCREMENT PRIMARY KEY
id
column, a UNIQUE
index on uuidv7
,
and a non-unique index on each status column. For CustomerJson
, this is accomplished via the use of
GENERATED ALWAYS AS... VIRTUAL
columns where the values are extracted from the JSON object; these virtual
scalar columns are then indexed (the index itself is of course stored).
Each table has 10,000,000 rows. Sizes in GiB are below, as well as the performance for a simple SELECT COUNT(*)
with predicates on two status columns.
Finally, both OPTIMIZE TABLE
and ANALYZE TABLE
were ran after loading and index creation to ensure a fair comparison.
A stored procedure was created to query INFORMATION_SCHEMA.TABLES
for data size (consisting of data and the PK),
as well as secondary index size.
mysql> CALL sizeof(DATABASE(), 'CustomerFlat');
+-------------------------------------------+
| result |
+-------------------------------------------+
| Data Size: 1.83 GiB, Index Size: 0.84 GiB |
+-------------------------------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> CALL sizeof(DATABASE(), 'CustomerFlatString');
+-------------------------------------------+
| result |
+-------------------------------------------+
| Data Size: 2.11 GiB, Index Size: 1.10 GiB |
+-------------------------------------------+
1 row in set (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
mysql> CALL sizeof(DATABASE(), 'CustomerJson');
+-------------------------------------------+
| result |
+-------------------------------------------+
| Data Size: 2.77 GiB, Index Size: 1.10 GiB |
+-------------------------------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
As expected, CustomerFlatString
and CustomerJson
have identical secondary index sizes, since they're both
indexing exactly the same values. CustomerFlat
's indices are 0.26 GiB smaller due to the status columns as ENUM
.
Additionally, the data in CustomerFlat
is 0.28 GiB smaller, for the same reason. Meanwhile, CustomerJson
's data
size is 0.94 GiB - nearly 1 GiB - larger than the baseline, for a total size difference of 1.2 GiB. On a per-row basis,
this equates to ~128 bytes additional size per row over the baseline.
All tables were thoroughly queried first to ensure as many of the pages were loaded into the buffer pool; additionally, the queries were run multiple times, with the best times for each being shown here.
mysql> SELECT COUNT(*) FROM CustomerFlat WHERE status = 'ACTIVE' AND status_account = 'VERIFIED' AND status_card = 'ISSUED';
+----------+
| COUNT(*) |
+----------+
| 9976 |
+----------+
1 row in set (0.31 sec)
mysql> SELECT COUNT(*) FROM CustomerFlatString WHERE status = 'ACTIVE' AND status_account = 'VERIFIED' AND status_card = 'ISSUED';
+----------+
| COUNT(*) |
+----------+
| 9976 |
+----------+
1 row in set (0.39 sec)
mysql> SELECT COUNT(*) FROM CustomerJson WHERE status = 'ACTIVE' AND status_account = 'VERIFIED' AND status_card = 'ISSUED';
+----------+
| COUNT(*) |
+----------+
| 9976 |
+----------+
1 row in set (1.23 sec)
Checking EXPLAIN
for each shows that all expected indices were considered.
For CustomerFlat
and CustomerFlatString
, an index merge is suggested, whereas for CustomerJson
, it only used
the index on status_account
. Following the actual query performance with optimizer_trace
shows that it doesn't
even consider trying to intersect indices. I'm not sure why, as I don't see that called out as a restriction in docs.
An index was then added on created_at
for each table, and a simple query checking all those created from 2024 - now
was ran (SELECT COUNT(*) FROM CustomerFlat WHERE created_at >= '2024-01-01 00:00:00'
). The result was nearly identical,
with CustomerFlat
taking 306.62 msec, CustomerFlatString
taking 309.87 msec, and CustomerJson
taking 316.70 msec.
This difference can likely be assigned to the larger row sizes of each table over the baseline. Even though JSON blobs
are stored off-page, they have a 20-byte pointer to the overflow page.