Skip to content

Instantly share code, notes, and snippets.

@vhbui02
Created May 17, 2023 17:27
Show Gist options
  • Save vhbui02/a25a4b66ae9d9f05ecf7ecc0ef149a4a to your computer and use it in GitHub Desktop.
Save vhbui02/a25a4b66ae9d9f05ecf7ecc0ef149a4a to your computer and use it in GitHub Desktop.
[MongoDB Clustered Collections] #mongodb

CLUSTERED COLLECTIONS

A Collection with 1 clustered index

Pros

    1. fast query without 2nd-ary index, use clustered index key instead to make range or equality comparison.
    1. lower storage size, very good for bulk inserts
    1. eliminate the need of TTL index, since: TTL index = clustered index + expireAfterSeconds option + supported _id field => short-lived documents improves delete performance and reduce storage size.
    1. have additional performance for inserts, updates, deletes and selects

Non-clustered Collection

  • _id index and the Document are stored seperately.
  • A query requires 2 reads and 2 write (1 for index and 1 for document)

Clustered Collection

  • _id index and the Document are stored together.
  • A query requires only 1 read and 1 write.

Note: the collection size reture by collStats command will includes clustered index size.

Behavior

clustered Collection store Documents which are pre-ordered by the clustered index key value. There can only be 1 clustered index.

Only Collections with a clustered index that store Documents in sorted order.

Might be a good idea to use both clustered index and 2nd-ary index.

Some limitations

    1. Migration from non-clustered to clustered and vice versa is not supported. Try to use an aggregation pipeline to read collection and write into another collection with corresponding type (e.g. $out stage and $merge stage can do this job)
    1. If 2nd-ary index and clustered index coexists, when query it will use 2nd-ary index defaultly. You must use hint() to force use clustered index and perform a bounded collection scan (idk if it's better than 2nd-ary index?)
    1. Clustered index key is _id field by default.
    1. It might not be capped collection

Define custom clustered index key

Some criteria to choose the new key:

  • contain unique value
  • immutable
  • contain sequentially increasing values (just like AUTO_INCREMENT, not mandatory but inproves insert performance)
  • small in size as possible since it lives together with the Document.

Check if a Collection is Clustered

db.runCommand({ listCollections: 1 }) // check for options > clusteredIndex

Create Clustered Collection example

// create
// one method
db.runCommand({
  create: "stocks",
  clusteredIndex: {
  	// same as below
    // ...
  }
})

// perferred method
db.createCollection(
	"stocks", // Collection's name
  {
    clusteredIndex: {
      "key": {
        _id: 1 // set the _id as key, not changing the value of it to 1
      },
      "unique": true,
      "name": "products clustered key" // clustered index's name
    }
  }
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment