The Tokyo Cabinet B+tree database (a.k.a TCBDB
) is implemented on-top of the
hash database (TCHDB
) that acts as the persistence layer.
Note: the same occurs with Tokyo table database (
TCTDB
) which is also a wrapper around theTCHDB
(the table database can be thought as a document oriented database with querying capabilities).
The TCBDB
features (among other things):
- an in-memory caching layer (for internal and leaves nodes) that can be tuned,
- an user-defined key comparison function (lexical by default),
- various iteration schemes via a convenient cursor,
- record duplication.
Note: record duplication is particularly convenient. See this gist for a concrete illustration, or redisk that uses it to implement Redis list data structure.
A B+ tree DB is made of:
-
Leaves:
uint32_t lmemb
(a.k.a leaf member): controls the maximum number of records that can be stored into a leaf, i.e. the leaf's page capacity- minimum value: 4 (see
BDBMINLMEMB
) - default value: 128 (see
BDBDEFLMEMB
)
- minimum value: 4 (see
uint32_t lsmax
: the maximum size of each leaf, i.e. the maximum amount of data that can be recorded into the page- default value: 16kb (see
BDBDEFLSMAX
) - minimum value: 0.5kb (see
BDBMINLSMAX
)
- default value: 16kb (see
-
Nodes:
uint32_t nmemb
(a.k.a node member): maximum number of members in each node, i.e. the indices list maximum capacity- minimum value: 4 (see
BDBMINLMEMB
) - default value: 256 (see
BDBDEFLMEMB
)
- minimum value: 4 (see
-
A cache for in-memory storage:
TCMAP *leafc
: an LRU cache for the leavesuint32_t lcnum
: the maximum number of cached leaves- default value: 1024 (see
BDBDEFLCNUM
) - minimum value: 64 (see
BDBLEVELMAX
)
TCMAP *nodec
: an LRU cache for the internal nodesuint32_t ncnum
: the maximum number of cached nodes- default value: 512 (see
BDBDEFNCNUM
) - minimum value: 64 (see
BDBLEVELMAX
)
-
An history array for visited nodes:
uint64_t *hist
: the history array of visited nodes of size 64 (max level of B+ tree)int hnum
: number of element of the history array
-
A comparison function for keys ordering:
TCCMP cmp
-
An internal key/value database (a.k.a hash database) for persistent storage:
TCHDB *hdb
: the underlying Hash DB (both internal and leaves nodes have an unique ID and are stored persistently into this internal DB)bnum
: the bucket number of the underlying Hash DB- default value: 32749 (see
BDBDEFBNUM
)
- default value: 32749 (see
xmsiz
: the memory mapped size of the underlying Hash DB- default value: 0 (see
tcbdbnew
:tchdbsetxmsiz(bdb->hdb, 0))
- default value: 0 (see
apow
: default alignment power of the underlying Hash DBfpow
: default free block pool power of the underlying Hash DB
As more and more (key, value) pairs are inserted, more and more records are added to the leaf's page.
This leads to a leaf divide:
- if the number of records is higher than
lmemb
(max records per leaf) - if the leaf size is higher than
lsmax
(max data size per leaf)
The bucket number should be computed according to:
- the total number of records to be stored (or expected if not known),
- the maximum number of records per leaf.
e.g. for rnum = 1M
(key, value)-s total number of records and lmemb = 256
records per leaf, this requires rnum / lmemb
leaves to be stored into the
underlying database, hence the bucket number should be a multiple, e.g.
bnum = 2 x rnum / lmemb.
Using a large multiple (e.g. 8 times) to obtain a large bucket number is recommended to lower the risk of hash values collisions within the underlying DB.
Increasing the cache size capacity through lcnum
and ncnum
parameters greatly
speeds-up the querying since for each cached leaf/node it allows to load it from
the in-memory storage avoiding 2 costly operations:
- leaf lookup, i.e. search the leaf object corresponding to the input key
(see
tcbdbsearchleaf
) - leaf load, i.e. retrieve the leaf by querying the internal DB (see
tcbdbleafload
)
In other words, to speed-up the runtime querying, it is recommended to choose
lcnum
and ncnum
parameters as high as possible.
If not all nodes can be kept in-memory, one must remember that the internal cache works in an LRU fashion, i.e. the most frequently requested records are kept in-memory.
- Official introduction to Tokyo Cabinet features
- Official Tokyo products slides - see slide 7.