Created
November 4, 2023 05:57
-
-
Save thiagobrabo/d36e6defdefe0c1abcd113dd85c83905 to your computer and use it in GitHub Desktop.
Reclaim Disk Space from MongoDB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 - Applies to MongoDB 3.2 or better and all following operations run on Mongo Shell. | |
You need to confirm your Mongo’s engine is WiredTiger: | |
```sh | |
db.serverStatus().storageEngine | |
// expect output: { "name" : "wiredTiger" } | |
``` | |
2 - Now, execucute show dbs to locate the biggest databases, and then locate the biggest collections execxute: | |
```sh | |
// You can use the following function multiple times | |
// during current session. | |
function CollectionSizes(collectionNames) { | |
let stats = [] | |
collectionNames.forEach(function (n) { | |
stats.push(db[n].stats(1024 * 1024 * 1024)) // show size in GB | |
}) | |
stats = stats.sort(function (a, b) { | |
return b['size'] - a['size'] | |
}) | |
print(`name: DB size in GB, disk size in GB`) | |
for (let c of stats) { | |
print(`${c['ns']}: ${c['size']} (${c['storageSize']})`) | |
} | |
} | |
CollectionSizes(db.getCollectionNames()) | |
``` | |
3 - MongoDB seldom returns disk space to operating system during deleting, these disk pages are marked dirty and reserved for writing in the future. Compact command lets WiredTiger return these space. (like PostgreSQL’s Vacuum) | |
Compact needs to be executed on every node in the cluster. | |
Compact locks the whole database and the sync also stops if you are running it on a slave node. When you are running a single-node MongoDB, you should execute compact during scheduled maintenance time. | |
However, in MongoDB cluster, you can do rolling compact which brings zero system down time. Start with slave nodes(don’t forget rs.slaveOk()), and finally run rs.stepDown() in master node then compact. | |
The following script compacts all collections in all non-system databases of your MongoDB, and it sleeps between long-running compacts to reduce OpLog sync lagging: | |
```sh | |
function compactDB(dbName) { | |
if ('local' !== dbName && 'admin' !== dbName && 'system' !== dbName) { | |
let subject = db.getSiblingDB(dbName) | |
subject.getCollectionNames().forEach(function (collectionName) { | |
let taskName = dbName + ' - ' + collectionName | |
let startAt = new Date() | |
print('compacting: ' + taskName) | |
subject.runCommand({compact: collectionName}) | |
let elapsed = ((new Date()) - startAt) / 1000 | |
print(taskName + ', finished in ' + elapsed + ' second(s).') | |
if (elapsed > 30) { | |
print('sleep a while for OpLog sync...') | |
sleep(8000) | |
} | |
}) | |
} | |
} | |
db.getMongo().getDBNames().forEach(compactDB) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment