Benchmarking `chia-blockchain` DB

This is a report on the chia-blockchain sqlite3 DB. The concern is that some operations are slow, and get slower over time. The symptom is the conversion from v1 of the DB to v2 never finishes and seems to get slower asymptotically as time goes on. This may affect performance during regular day-to-day operation, since DB conversion is not wholy different from DB operations performed on a day-to-day basis.

Summary

There are six tables and eight indices.

Two tables are trivial; two tables are fast as indexed. Two tables are slow, and coin_record provides special challenges.

We recommend dropping an index on hints and experimenting with denormalizing alternatives in coin_record.

Benchmarking

For non-trivial tables, benchmarking is done. The benchmark includes piping a .dump of the table to a new sqlite3 process that creates a new DB with just the dumped table. In some cases, we transform the table by eliding indices or or PRIMARY KEY/UNIQUE constraints to see what performance impact is. We pipe through pv -bart to measure the rate of data passing through the pipe.

Tiny tables

`current_peak(key int PRIMARY KEY, hash blob)`

`database_version`

Each of these two tables has just a single row, so won't cause performance issues.

Fast tables

`full_blocks(header_hash blob PRIMARY KEY,prev_hash blob,height bigint,sub_epoch_summary blob,is_fully_compactified tinyint,in_main_chain tinyint,block blob,block_record blob)`

3749003 rows

This table is only needed for archival nodes. Once a block is fully processed, it is never needed again, except to bootstrap other nodes. So this table is all about being "polite" and is completely unnecessary to sync or farm.

This table has three indices.

CREATE INDEX height on full_blocks(height);
CREATE INDEX is_fully_compactified ON full_blocks(is_fully_compactified, in_main_chain) WHERE in_main_chain=1;
CREATE INDEX main_chain ON full_blocks(height, in_main_chain) WHERE in_main_chain=1;

The PK is 32 bytes. It's really an alternate key since this table doesn't use without ROWID. Test show using this slows writes massively, probably because the table then uses a clustered index. https://www.sqlite.org/withoutrowid.html

500000 rows of full_blocks simple .dump
7.59GiB 0:13:15 [9.77MiB/s] [9.77MiB/s]
500000 rows of full_blocks without PRIMARY KEY
7.59GiB 0:07:53 [16.4MiB/s] [16.4MiB/s]

This table is a little slower than straight writes (ie. removing PRIMARY KEY), but it's within 50%, so this slowdown isn't catastrophic. No changes are required at this time.

`sub_epoch_segments_v3(ses_block_hash blob PRIMARY KEY,challenge_segments blob)`

9756 rows

This table is for sub-epoch challenges. The rows are large, but there are not very many of them. These rows are very large, averaging 1.13 MB/row.

10000 rows of sub_epoch_segments_v3 simple .dump
10.8GiB 0:05:25 [33.9MiB/s] [33.9MiB/s]

A straight copy is extremely fast. No changes are required.

Slow tables

`hints(coin_id blob, hint blob, UNIQUE (coin_id, hint))`

This table allows wallet protocol queries to be answered without doing full table-scans (which would be prohibitive). It's not needed at all for consensus, but just to provide fully-indexed wallet queries, as is normally done by Electrum-style nodes on the bitcoin network.

This table has one index:

CREATE INDEX hint_index on hints(hint);

11332512 rows

500000 rows of hints simple .dump
77.7MiB 0:00:24 [3.18MiB/s] [3.18MiB/s]
500000 rows of hints with indices pre-created
77.7MiB 0:00:37 [2.08MiB/s] [2.08MiB/s]
500000 rows of hints without PRIMARY KEY
77.7MiB 0:00:24 [3.12MiB/s] [3.12MiB/s]
500000 rows of hints without UNIQUE
77.7MiB 0:00:05 [15.5MiB/s] [15.5MiB/s]
5000000 rows of hints without UNIQUE
 760MiB 0:02:23 [5.30MiB/s] [5.30MiB/s]

11332512 rows of hints without UNIQUE
1.70GiB 0:03:50 [8.07MiB/s] [7.56MiB/s]
1.70GiB 0:03:50 [7.56MiB/s] [7.56MiB/s]
11332512 rows of hints simple .dump
1.70GiB 3:32:06 [ 139KiB/s] [ 139KiB/s]

Here, the UNIQUE constraint causes significant, and eventually catastrophic, slowdown with not that much benefit. Duplicated hints waste space, but cause no runtime problems, as they can be deduplicated at query time. If there becomes an issue with hint rows being duplicated, an occasional "garbage collection" process can be run to remove them.

Recommendation: drop the UNIQUE constraint, and consider creating a garbage collection process that can be run automatically or manually to remove duplicate rows.

`coin_record(coin_name blob PRIMARY KEY, confirmed_index bigint, spent_index bigint, coinbase int, puzzle_hash blob, coin_parent blob, amount blob, timestamp bigint)`

258615917 rows

This table keeps track of coin lifecycles, and which coins are "live" and spendable. This is the only table that is strictly necessary for consensus. It has four indices that greatly slow writing in non-linear ways, as well as a PRIMARY_KEY on coin_name.

TLDR SUMMARY: This table provides the greatest performance concerns. Adding even a single index eventually causes significant (10x) slowdowns. Breaking this table into multiple tables that encompass the diverse functionality of this table may improve performance at the cost of complexity and possibly a little redundancy.

This table has four indices:

CREATE INDEX coin_confirmed_index on coin_record(confirmed_index);
CREATE INDEX coin_spent_index on coin_record(spent_index);
CREATE INDEX coin_puzzle_hash on coin_record(puzzle_hash);
CREATE INDEX coin_parent_index on coin_record(coin_parent);

Two of the indices are on small integer columns, and two are on 32-byte blob columns.

Here's the initial summary:

5000000 rows of coin_record simple .dump
1.32GiB 0:47:36 [ 483KiB/s] [ 483KiB/s]
5000000 rows of coin_record with indices pre-created
1.32GiB 2:08:30 [ 179KiB/s] [ 179KiB/s]
5000000 rows of coin_record without PRIMARY KEY
1.32GiB 0:01:37 [13.8MiB/s] [13.8MiB/s]
5000000 rows of coin_record without ROWID
1.32GiB 0:51:48 [ 444KiB/s] [ 444KiB/s]

Note that copying this table speeds up massively without PRIMARY_KEY (or indices).

5000000 rows of coin_record without PRIMARY KEY
1.32GiB 0:01:42 [13.2MiB/s] [13.2MiB/s]
20000000 rows of coin_record without PRIMARY KEY
5.26GiB 0:06:41 [13.4MiB/s] [13.4MiB/s]
40000000 rows of coin_record without PRIMARY KEY
10.5GiB 0:13:30 [13.3MiB/s] [13.3MiB/s]

The data above shows that there is no slowdown as we continue to add rows.

Let's try using an index as a replacement for the PRIMARY_KEY on coin_name.

DB=blockchain_v2_mainnet.sqlite
COUNT=10000000
TABLE=coin_record
rm temp.db
echo 'CREATE TABLE coin_record(coin_name blob            , confirmed_index bigint, spent_index bigint, coinbase int, puzzle_hash blob, coin_parent blob, amount blob, timestamp bigint);' | sqlite3 temp.db
echo 'CREATE INDEX coin_name on coin_record(coin_name);' | sqlite3 temp.db
echo $COUNT rows of ${TABLE} with indices pre-created
(sqlite3 ${DB} ".dump ${TABLE}" | sed 's|CREATE|\-\- CREATE|g' | head -${COUNT}  && echo 'COMMIT;') | pv -bart | sqlite3 temp.db

1000000 rows of coin_record with indices pre-created
 269MiB 0:02:21 [1.91MiB/s] [1.91MiB/s]
10000000 rows of coin_record with indices pre-created
2.63GiB 3:39:26 [ 209KiB/s] [ 209KiB/s]

Unfortunately, this runs into the same issue. Having an index that's so large (32 bytes) causes a massive slowdown, worse than just using PRIMARY KEY, curiously.

Now let's try without PRIMARY_KEY and just one index, the first one, which is small.

DB=blockchain_v2_mainnet.sqlite
COUNT=1000000
TABLE=coin_record
rm temp.db
echo 'CREATE TABLE coin_record(coin_name blob            , confirmed_index bigint, spent_index bigint, coinbase int, puzzle_hash blob, coin_parent blob, amount blob, timestamp bigint);' | sqlite3 temp.db
echo 'CREATE INDEX coin_confirmed_index on coin_record(confirmed_index);' | sqlite3 temp.db
echo '-- CREATE INDEX coin_spent_index on coin_record(spent_index);' | sqlite3 temp.db
echo '-- CREATE INDEX coin_puzzle_hash on coin_record(puzzle_hash);' | sqlite3 temp.db
echo '-- CREATE INDEX coin_parent_index on coin_record(coin_parent);' | sqlite3 temp.db
echo $COUNT rows of ${TABLE} with indices pre-created
(sqlite3 ${DB} ".dump ${TABLE}" | sed 's|CREATE|\-\- CREATE|g' | head -${COUNT}  && echo 'COMMIT;') | pv -bart | sqlite3 temp.db

1000000 rows of coin_record with indices pre-created
 269MiB 0:00:21 [12.4MiB/s] [12.4MiB/s]
10000000 rows of coin_record with indices pre-created
2.63GiB 0:03:46 [11.9MiB/s] [11.9MiB/s]

With one small index (an int), we slow down a little but not much.

Now let's try it with two small indices.

DB=blockchain_v2_mainnet.sqlite
COUNT=10000000
TABLE=coin_record
rm temp.db
echo 'CREATE TABLE coin_record(coin_name blob            , confirmed_index bigint, spent_index bigint, coinbase int, puzzle_hash blob, coin_parent blob, amount blob, timestamp bigint);' | sqlite3 temp.db
echo 'CREATE INDEX coin_confirmed_index on coin_record(confirmed_index);' | sqlite3 temp.db
echo 'CREATE INDEX coin_spent_index on coin_record(spent_index);' | sqlite3 temp.db
echo '-- CREATE INDEX coin_puzzle_hash on coin_record(puzzle_hash);' | sqlite3 temp.db
echo '-- CREATE INDEX coin_parent_index on coin_record(coin_parent);' | sqlite3 temp.db
echo $COUNT rows of ${TABLE} with indices pre-created
(sqlite3 ${DB} ".dump ${TABLE}" | sed 's|CREATE|\-\- CREATE|g' | head -${COUNT}  && echo 'COMMIT;') | pv -bart | sqlite3 temp.db

1000000 rows of coin_record with indices pre-created
 269MiB 0:00:26 [10.2MiB/s] [10.2MiB/s]
10000000 rows of coin_record with indices pre-created
2.63GiB 0:31:00 [1.45MiB/s] [1.45MiB/s]

With the first two indices, both small ints, we slow down quite a lot.

Let's try it with the two large indices. This should be just as bad or worse.

DB=blockchain_v2_mainnet.sqlite
COUNT=10000000
TABLE=coin_record
rm temp.db
echo 'CREATE TABLE coin_record(coin_name blob            , confirmed_index bigint, spent_index bigint, coinbase int, puzzle_hash blob, coin_parent blob, amount blob, timestamp bigint);' | sqlite3 temp.db
echo '-- CREATE INDEX coin_confirmed_index on coin_record(confirmed_index);' | sqlite3 temp.db
echo '-- CREATE INDEX coin_spent_index on coin_record(spent_index);' | sqlite3 temp.db
echo 'CREATE INDEX coin_puzzle_hash on coin_record(puzzle_hash);' | sqlite3 temp.db
echo 'CREATE INDEX coin_parent_index on coin_record(coin_parent);' | sqlite3 temp.db
echo $COUNT rows of ${TABLE} with indices pre-created
(sqlite3 ${DB} ".dump ${TABLE}" | sed 's|CREATE|\-\- CREATE|g' | head -${COUNT}  && echo 'COMMIT;') | pv -bart | sqlite3 temp.db
10000000 rows of coin_record with indices pre-created
2.63GiB 3:06:45 [ 246KiB/s] [ 246KiB/s]

With the last two indices, which are both large, we slow down to really intolerable levels.

This data shows that the indices, especially beyond the first one, slowing writing to this table down immensely, so this is likely a performance issue.

The first index causes a small slow-down if it's small at first, but continues to get slower as time goes on. Likely this is because eventually the size of the index exceeds the cache, and so each insert requires multiple disk seeks through a b-tree.

Recommendation:

More testing needs to be done. It seems replacing this single table with multiple tables each having at most one index may prevent these intolerable slowdowns. The following tables might work to replace coin_record:

CREATE TABLE blob(bytes32 blob) CREATE INDEX bytes32 on blob(bytes32)

In blob, we use the built-in ROWID as the primary key.

CREATE TABLE coin_record_new(coin_name_id int, confirmed_index bigint, spent_index bigint, puzzle_hash_id int, coin_parent_id int, amount int timestamp bigint);

Instead of indexing on confirmed_index and spent_index, create a new table coin_lifecycle that has a row for each block with a blob that lists all created and destroy coins (by id) for that block. The bonus is this is committed to by blockchain data, so with care it can be verified against the hash in a block. (Or is this deprecated?)

CREATE TABLE coin_lifecycle(block_index bigint PRIMARY KEY, created_coin_list blob, destroy_coin_list blob) WITHOUT ROWID;

[ or maybe

CREATE TABLE coin_create_log(block_index bigint, created_coin_id int) CREATE TABLE coin_spend_log(block_index bigint, spent_coin_id int) ]

The indices on puzzle_hash and coin_parent are only needed to service wallet queries.

CREATE TABLE coin_puzzle_hash_lookup(puzzle_hash_rowid int, coin_record_id int) CREATE TABLE coin_parent_lookup(parent_hash_rowid int, coin_record_id int)

We could explore making them optional, and creating a separate table that indexes each item separately using coin_record as a foreign key. Keep each table to one index max, that seems to be most beneficial for speeding writes.

richardkiss/chia-db.md

Summary

Benchmarking

Tiny tables

current_peak(key int PRIMARY KEY, hash blob)

database_version

Fast tables

full_blocks(header_hash blob PRIMARY KEY,prev_hash blob,height bigint,sub_epoch_summary blob,is_fully_compactified tinyint,in_main_chain tinyint,block blob,block_record blob)

sub_epoch_segments_v3(ses_block_hash blob PRIMARY KEY,challenge_segments blob)