the component of the database that is responsible for managing how data is stored
REFRESH MEMORY: data is stored in-memory and on disk.
Multiple storage engines performs better for specific workloads. Choose the right WILL SIGNIFICANTLY impact the performance.
To check if your MongoDB instance is using Wired Tiger:
db.serverStatus().storageEngine
Multiple clients can modify different documents of a collection at the same time.
Intent shared lock: for Read operation that do not change or update data, such as find()
query
Intent exclusive lock: Write operation, such as save(), updateOne(), updateMany()....
Intent shared lock blocking Intent exclusive lock and vice versa.
Intent shared lock DO NOT block other Intent shared lock
Intent exclusive lock DO block other Intent exclusive lock
When blocking, they will transparently (implicitly) retry that operation.
Database lock, collection lock are required for some special operations.
- block compression with snappy compression lib (default if use WiredTiger) for all collections
- prefix compression for all indexes
Wired Tiger:
default = MAX((physical_ram - 1GB) * 50%, 256MB)
E.g. 4GB RAM, WiredTiger cache will use (0.5 * (4GB - 1GB)) = 1.5GB.
A system of 1.25GB will allocate 256MB since (0.5 * (1.25GB - 1GB)) = 0.128 GB < 256MB.
Filesystem
- Data in cache is the same as on-disk format
- Reduce disk I/O since OS use data directly from cache.
Index
- In cache have different reps to the on-disk format.
- Remain advantage of index prefix compression to reduce its size => reduce RAM usage. (Index prefix compression deduplicates common prefixes from indexed fields)
Now that's interesting! If the indexed values does not different so much and share some prefix, they can have smaller size when compressed.
Collection
- In cache uses a different reps from the on-disk format.
- On-disk data is compressed by block compression, but cache data remains uncompressed, in order to be manipulated by server
storage.wiredTiger.engineConfig.cacheSizeGB
or --wiredTigerCacheSizeGB
Note: Avoid increasing the cache size above its default value
Operation start => WiredTiger provides a point-in-time snapshot of the data to the operation, rather than tell it to go look for index or docs.
This snapshot presents a consistent view of the in-memory data.
Durable Write Operation: it write data in snapshot to disk in a consistent way across data files (consistent how? idk). This data is now durable.
Durable only if:
- for MongoDB instance is a standalone instance, the write operation must be logged in server's journal file,
- for MongoDB instance is a replica set, the write operation must be loggen in a majority of voting nodes's journal file.
The durable data act as a checkpoint in the data files. This checkpoint ensures the data is consistent up to, it contains previous checkpoint as well to act as a recovery point.
When writing new checkpoint, it still holds the prev checkpoint. In case of a failure, an error occurs while writing new checkpoint, after Mongo restart it can revert to old data using prev checkpoint.
With WiredTiger, without journaling, MongoDB can recover from last checkpoint. But in order to recover the changes have been made, journaling is required.
Note: Replica set that use WiredTiger storage engine ALWAYS use journaling.
The old snapshot can not be kept for long because if minSnapshotHistoryWindowInSeconds
is too high, it will keep the snapshot for a long time and increases disk space.
Make it low enough to ensure consistency without expensive storage usage.
The WiredTiger journal persists all data modification between checkpoints, in case of exiting between checkpoint (internal error, power outage, ...) after restart it uses the journal to replay all data modified since last checkpoint.
But this means MongoDB has to add 1 more write operation: from journal to disk. But the operation frequency is manageable (how? idk)
This file uses snappy compression lib (the same lib used by block compression for collection data).
Only for MongoDB Enterprise