- Different kinds of applications nowadays
- Focus on solving new problems
- We can store queries just like any other object (JSON)
- Replication: vertical scaling x Sharding: horizontal scaling
- Support subscriptions
- Training
- Consulting
- Partners
- 4 níveis de consistência (escrita)
- conectou ao servidor e recebeu ACK
- conectou e salvou em memória
- fsync, escreveu em disco
- fsync replica, escreveu em disco em todas as réplicas
- Journaling
Table scans - slow - O(n)
Indexed: BTREE lookup - faster - O(log n)
Profiler = nice tool to analyze slow queries
query.explain(); --> look at "nscanned"
Se existe mais de um índice possível, mongod faz "specular search", tenta todos lembrando a razão n/nscanned => depois usa o melhor.
- Covering Indexes
- Query resolvida apenas no índice
- Sparse Indexes
- Não cria índice seo documento não tem o campo indexado. Otherwise, "null" is used for indexing.
Geospatial Indexes
Listing: db.posts.getIndexes();
Background building: db.posts.ensureIndex(..., {background: true});
Sorting queries is limited to 32 MB => create index on sort fields
What if your data distribution change? How to update your (cached) Query Plan?
- 100 writes mongo will "unlearn"
- add/remove indexes / query plans
Query Plans are automatically stored by query pattern.
If nscanned is (currently) 10x more than stored on the QP, then other plans are evaluated (Bad QP ensurance / relearning)
- Avoid single point of failure
- Availability and durability
- Fire & forget (default)
- getLastError()
- j:true = guarantee that journal was written
- fsync: true
- w: n, "majority", tag
- Hidden node
- Priority 0 node
- Priorities can be used to decide what's the primary node
- Arbiters (to break ties) => used to have 2 data nodes + 1 arbiter (odd number of nodes)
mongos can be deployed on the application server (and talk to your application via loopback interface, localhost)
mongos is like a router
- You must have exactly 3 config servers (you can run with 1 for development)
They should, obviously, be on different machines
- The shard key is immutable in 2 ways:
- Cannot change the shard key for a sharded collection
- Cannot update the value of the key field
As you insert more data you get more chunks (splits)
Chunks stay within a shard
Every shard is balanced to have all the same number of chunks
Distributed Merge Sort
Chunks contain sequential data, but shards need not
A shard key doesn't have to be unique, but it should be granular. It should be able to split fairly easy.
- Example: (country, userid)
- Activity Stream: shard by userid
- Don't use JS, implemented in C++
- Aggregation pipelines
- Operations:
- $match: selector/filter (like "db.coll.find()")
- $project: reshape results (like XSLT) (1-to-1)
- include/exclude fields
- operations are storable (JSON structs)
- $unwind: can "stream" arrays (1-to-many)
- $group
- $sort
- Use $match and $sort as early as possible
- /data/db/test.ns (16MB) has namespace for collections + indices
- Use ext4 or XFS for performance because of pre-allocation features
- Storage organized in "extents"
- Use db.coll.validate(true) to see allocation info
- Collections are stored in extents as linked lists
- Indices use extents as BTREE
- Journaling is for fast crash recovery (not for durability)
- db.stats(); db.coll.stats();