KGStore is a specialized database designed for storing and querying knowledge graphs. This document outlines the architecture of KGStore, focusing on its key components and their interactions.
KGStore is a single-file database for knowledge graphs that uses:
- Skip list-based in-memory tables and sorted string tables for data organization
- Page-based storage for efficient I/O
- Log-structured merge (LSM) tree for high write throughput
- Write-Ahead Logging (WAL) for durability and crash recovery
- Bloom filters for efficient key lookups
The database is built with a focus on:
- Fast graph traversals
- Efficient property lookups
- ACID (Atomicity, Consistency, Isolation, Durability) compliance
- High write throughput
- Low read latency
Nodes represent entities in the knowledge graph:
- Each node has a unique ID
- Nodes have a label (type/category)
- Nodes can have key-value properties
type Node struct {
ID uint64 // Unique identifier for the node
Label string // Type or category of the node
Properties map[string]string // Key-value pairs of node properties
}
Edges represent relationships between nodes:
- Each edge connects two nodes (source and target)
- Edges have a label (relationship type)
- Edges can have key-value properties
type Edge struct {
SourceID uint64 // ID of the source node
TargetID uint64 // ID of the target node
Label string // Type or category of the relationship
Properties map[string]string // Key-value pairs of edge properties
}
The database uses a page-based storage system:
- Pages are fixed-size blocks of data (the fundamental unit of I/O)
- Each page has a unique ID
- Pages can be flagged as dirty, pinned, or indexed
- Pages are used for both data and index structures
type Page struct {
ID uint64 // Unique identifier for the page
Data []byte // Raw data stored in the page
Flags PageFlag // Status flags for the page
}
The MemTable is an in-memory data structure that buffers recent writes:
- Implemented as a skip list (O(log n) operations)
- Sorted key-value store
- Thread-safe with mutex protection
- Size-bounded (configurable max size)
- When full, becomes immutable and is flushed to disk as an SSTable
type MemTable struct {
head *skipNode // Pointer to the head node
maxHeight int // Maximum height of skip list
currentHeight int // Current height of the skip list
size uint64 // Total size in bytes
count int // Number of entries
maxSize uint64 // Maximum size in bytes
isFlushed bool // Whether the MemTable has been flushed
comparator Comparator // Function for comparing keys
currentVersion uint64 // Version counter for tracking changes
}
SSTables are immutable on-disk files containing sorted key-value pairs:
- Each SSTable consists of:
- Data file: contains actual key-value pairs
- Index file: enables binary search lookup
- Bloom filter file: enables quick non-membership tests
- SSTables are organized in a level-based structure for efficient compaction
- Format includes a header with metadata (magic number, version, key count, min/max keys)
type SSTable struct {
id uint64 // Unique identifier for this SSTable
path string // Path to the SSTable directory
dataFile string // Path to the data file
indexFile string // Path to the index file
filterFile string // Path to the bloom filter file
keyCount uint32 // Number of key-value pairs
dataSize uint64 // Size of the data file in bytes
minKey []byte // Minimum key in the SSTable
maxKey []byte // Maximum key in the SSTable
indexCache map[string]uint64 // Cache for index lookups
}
The WAL provides durability and crash recovery:
- Records all write operations before they are applied to the MemTable
- Each record includes:
- Record type (PUT or DELETE)
- Key and value
- Timestamp
- CRC32 checksum for integrity
- During recovery, the WAL is replayed to restore the state of the MemTable
type WALRecord struct {
Type RecordType
Key []byte
Value []byte
Timestamp int64
}
Bloom filters provide efficient membership tests for SSTables:
- Probabilistic data structure that allows for fast non-membership checks
- False positives are possible but false negatives are not
- Configurable false positive rate
- Used to quickly skip SSTables that don't contain a key
type BloomFilter struct {
bits []byte // The bit array
size uint64 // The size of the bit array in bits
hashFuncs uint64 // The number of hash functions
expectedN uint64 // The expected number of elements
insertions uint64 // The number of elements inserted
}
KGStore uses a leveled compaction strategy to manage SSTables:
- SSTables are organized into multiple levels (typically 7 levels)
- Level 0 contains newest SSTables, which may have overlapping key ranges
- Higher levels contain older SSTables with non-overlapping key ranges
- When level N gets too large, some SSTables are merged with level N+1
- Each level is exponentially larger than the previous (typically 10x)
This approach:
- Reduces write amplification
- Spreads out I/O load
- Improves read performance by reducing the number of SSTables to search
The query engine provides operations for traversing the knowledge graph:
Executes parsed queries against the storage engine:
- Handles various query types (find nodes/edges by label, find neighbors, find paths)
- Utilizes indexes for efficient lookups
- Returns structured results
Optimizes query plans for better performance:
- Reorders operations to minimize intermediate results
- Chooses the most efficient indexes
- Applies query-specific optimizations
Handles graph traversal operations:
- Breadth-first and depth-first traversals
- Path finding between nodes
- Neighborhood exploration
- When a key is requested, the system checks:
- Current MemTable
- Immutable MemTables (newest to oldest)
- SSTables (newest to oldest at each level)
- For SSTable lookups:
- Bloom filters are checked first to quickly skip SSTables
- Binary search is used on the index file to find the key's position
- The data file is accessed at the specific offset to retrieve the value
- Each write operation (put/delete) is:
- Recorded in the WAL
- Applied to the current MemTable
- When the MemTable reaches its size limit:
- It is marked as immutable
- A new MemTable is created for future writes
- The immutable MemTable is scheduled for flushing
- Flushing converts an immutable MemTable to an SSTable:
- Key-value pairs are written to a new SSTable
- The SSTable is added to Level 0
- Compaction periodically merges SSTables:
- Level 0 SSTables are merged into Level 1
- If a level exceeds its size threshold, SSTables are merged into the next level
- Merging resolves duplicate keys (newer values override older ones)
- During database startup:
- Existing SSTables are loaded
- The WAL is replayed to recover recent operations
- Any immutable MemTables are flushed to disk
- Log-structured approach provides high write throughput
- In-memory MemTable for fast writes
- Batched disk I/O through SSTable creation
- Write amplification is minimized through careful compaction
- Skip list MemTable for O(log n) lookups
- Bloom filters to quickly skip irrelevant SSTables
- Binary search on SSTable indexes
- Level-based organization to reduce the number of SSTables to search
- MemTable and index caching for frequently accessed data
- Compaction removes duplicate and deleted keys
- Leveled structure balances space usage and compaction frequency
- Configurable bloom filter false positive rates to balance space and performance
KGStore's architecture combines several proven database techniques with specialized optimizations for knowledge graph operations. The LSM tree structure provides excellent write performance, while the various indexing strategies ensure efficient reads and traversals. The combination of WAL and carefully structured SSTables provides durability and reliability.