We are modifying the chia-blockchain project to allow RocksDB to be optionally used instead of an SQLite database. RocksDB's LSM-tree structure improves upon the SQLite B-tree performance degradation we observe as the coin database grows larger.
- prototype suggests RocksDB can be used for consensus-critical data with a simple schema and a simple API using just write batches rather than full transactions
- encapsulation of DBWrapper2 in SQLite3-based storage wrappers
- migrate from
CoinStore
,BlockStore
,HintStore
toConsensusStoreProtocol
,BlockArchiveStoreProtocol
, andExplorerStoreProtocol
using existing code paths to createConsensusStoreSQLite3
,BlockArchiveStoreSQLite3
, andExplorerStoreSQLite3
implementations - add consistency checks across stores at a higher layer to allow stores to have independent transaction boundaries
ConsensusStoreRocksDB
implementation- future of
BlockArchiveStore
andExplorerStore
TBD
A prototype has been running consensus for several months without issue. This prototype removes the coin_record
table and implements just enough methods in CoinStoreV3
to allow consensus to continue. The block archive remains as SQLite, but a simplified version of coin records are stored in a RocksDB database with only enough information to support consensus operations. Data for explorer-like calls, like those that wallets tend to make, is missing so those CoinStoreV3
methods raise an exception. This proof-of-concept branch (forked May 2025, so quite out of date) is at https://github.com/richardkiss/chia-blockchain/tree/db_v3.
We want to separate concerns so that nodes don't have to be monolithic, but rather can choose whether to maintain a full block archive or explorer-like functions for wallet support (parent coin indices and puzzle has indices).
Persistent data currently goes through CoinStore
, BlockStore
and HintStore
, along with a BlockHeightCache
(which contains easily rebuilt cached block data). Rather than separating by data type, we want to separate by function: ConsensusStore
, BlockArchiveStore
, and ExplorerStore
. This enables different node types: minimal consensus-only nodes, explorer nodes for wallet services, and full archive nodes for complete history.
Aside: to support partial archive nodes, a fork that restricts generator references to a finite depth would be extremely helpful. I strongly recommend this be done for this reason, and also to mitigate potential DOS attacks. It also allows us to put consensus functions into a RocksDB database while leaving archive and explorer functions in SQLite, or in a separate non-consensus-critical RocksDB database to better expose resource requirements for each feature.
Creating ConsensusStoreProtocol
, BlockArchiveStoreProtocol
, and ExplorerStoreProtocol
interfaces allows us to define clear APIs for each function. The existing implementation will be wrapped in ConsensusStoreSQLite3
, BlockArchiveStoreSQLite3
, and ExplorerStoreSQLite3
classes that use DBWrapper2 under the covers, and avoid any changes to code path when possible.
DBWrapper2 has grown complex over time. Despite its name, it's closely bound to sqlite and really acts as a wrapper to sqlite3. The resource it wraps -- an sqlite3 connection -- does not easily transfer to RocksDB, and provides an arbitrary sql-based api. Using it as the base when moving to rocks db would add even more complexity and the cross-table consistency promises it makes can only be kept if all tables are in a single sqlite3 database.
Much of the complexity of DBWrapper2 seems to be motivated to fix consistency problems with the wallet. For consensus, the database writes seem contained and predictable.
Now that the chia full node is fairly mature and its data storage needs are well-understood, we can move to a new pattern: rather than the sqlite3.connection
object provided by DBWrapper2
, the ConsensusStoreProtocol
will have an async context manager write()
method that yields a made-for-chia-data object conforming to ConsensusStoreWriteProtocol
. This provides transaction semantics around all mutating methods like .write_xx()
and .rewind_to_height
. A similar pattern can be used for ConsensusReadProtocol
if it turns out to be necessary. Similar semantics can be used for BlockArchiveStore
and ExplorerStore
.
(Note that the prototype linked above shoehorns the RocksDB into DBWrapper2
.)
Cross-store consistency, which DBWrapper2 currently provides by creating a transaction that wraps both CoinStore
and BlockStore
, can be handled at a higher application-specific level.
We will define an update ordering policy (for example commit to ConsensusStoreProtocol
, then BlockArchiveStoreProtocol
, then ExplorerStoreProtocol
) and a startup reconciliation process that knows the order the stores are updated can ensure that all stores are consistent.
It might apply one of these two strategies (and which one makes the most sense will be determined later):
- replaying updates that do not appear to have been completed across all stores
OR
- rewinding to the highest common height that appears consistent across all three stores, and restart the sync from there
This start-up reconcilation can be run even with existing sqlite3 DBWrapper2 deployments. Because a DBWrapper2 transaction wraps all legacy stores, reconciliation should never be necessary, but correct check should still be harmless.
The migration path is as follows:
-
Continue to refactor
full_node
so all database calls use the new Store classes instead of the old ones. Continue to use DBWrapper2 under the covers to minimize logic changes, but do not use it directly except inCoinStoreSQLite3
/BlockStoreSQLite3
or maybeConsensusStoreSQLite3
implementations. -
deprecate the old store classes and bury references to them (and DBWrapper2) in
*StoreSQLite3
implementations -
add an option to use a
ConsensusStoreRocksDB
implementation -
(maybe) implement
BlockArchiveStoreRocksDB
andExplorerStoreRocksDB
as separate RocksDB instances with configurable enable/disable options -
release a version of chia-blockchain that allows opt-in RocksDB support
-
once confidence is sufficient, switch to RocksDB as the default for new installations
-
provide a migration path for existing installations.
-
remove SQLite support (for consensus)