Skip to content

Instantly share code, notes, and snippets.

@rhcarvalho
Created July 15, 2012 23:55
Show Gist options
  • Select an option

  • Save rhcarvalho/3119271 to your computer and use it in GitHub Desktop.

Select an option

Save rhcarvalho/3119271 to your computer and use it in GitHub Desktop.
MongoDB São Paulo - 13/07/2012

MongoDB São Paulo - 13/07/2012

1. Welcome Keynote - Paul Pedersen (10gen)

  • Different kinds of applications nowadays
  • Focus on solving new problems
  • We can store queries just like any other object (JSON)
  • Replication: vertical scaling x Sharding: horizontal scaling

10gen services (how they make $)

  • Support subscriptions
  • Training
  • Consulting
  • Partners

2. MongoDB: Project Supercharger - Norberto Leite (Telefónica, Barcelona)

  • 4 níveis de consistência (escrita)
    1. conectou ao servidor e recebeu ACK
    2. conectou e salvou em memória
    3. fsync, escreveu em disco
    4. fsync replica, escreveu em disco em todas as réplicas
  • Journaling

3. Indexing and Query Optimization - Pedersen (10gen)

  • Table scans - slow - O(n)

  • Indexed: BTREE lookup - faster - O(log n)

  • Profiler = nice tool to analyze slow queries

  • query.explain(); --> look at "nscanned"

  • Se existe mais de um índice possível, mongod faz "specular search", tenta todos lembrando a razão n/nscanned => depois usa o melhor.

  • Covering Indexes
    • Query resolvida apenas no índice
  • Sparse Indexes
    • Não cria índice seo documento não tem o campo indexado. Otherwise, "null" is used for indexing.
  • Geospatial Indexes

  • Listing: db.posts.getIndexes();

  • Background building: db.posts.ensureIndex(..., {background: true});

  • Sorting queries is limited to 32 MB => create index on sort fields

  • What if your data distribution change? How to update your (cached) Query Plan?

    • 100 writes mongo will "unlearn"
    • add/remove indexes / query plans
  • Query Plans are automatically stored by query pattern.

  • If nscanned is (currently) 10x more than stored on the QP, then other plans are evaluated (Bad QP ensurance / relearning)

Schema Design - Kevin Hanson (10gen)

Replication - Spencer Brody (10gen)

  • Avoid single point of failure
  • Availability and durability
    • Fire & forget (default)
    • getLastError()
      • j:true = guarantee that journal was written
      • fsync: true
      • w: n, "majority", tag
  • Hidden node
  • Priority 0 node
  • Priorities can be used to decide what's the primary node
  • Arbiters (to break ties) => used to have 2 data nodes + 1 arbiter (odd number of nodes)

Sharding - Spencer Brody (10gen)

  • mongos can be deployed on the application server (and talk to your application via loopback interface, localhost)

  • mongos is like a router

  • You must have exactly 3 config servers (you can run with 1 for development)

    They should, obviously, be on different machines

  • The shard key is immutable in 2 ways:
    • Cannot change the shard key for a sharded collection
    • Cannot update the value of the key field
  • As you insert more data you get more chunks (splits)

  • Chunks stay within a shard

  • Every shard is balanced to have all the same number of chunks

  • Distributed Merge Sort

  • Chunks contain sequential data, but shards need not

  • A shard key doesn't have to be unique, but it should be granular. It should be able to split fairly easy.

    • Example: (country, userid)
    • Activity Stream: shard by userid

Aggregation Framework - Pedersen (10gen)

  • Don't use JS, implemented in C++
  • Aggregation pipelines
  • Operations:
    • $match: selector/filter (like "db.coll.find()")
    • $project: reshape results (like XSLT) (1-to-1)
      • include/exclude fields
      • operations are storable (JSON structs)
    • $unwind: can "stream" arrays (1-to-many)
    • $group
    • $sort
  • Use $match and $sort as early as possible

Journaling and Storage - Spencer Brody

  • /data/db/test.ns (16MB) has namespace for collections + indices
  • Use ext4 or XFS for performance because of pre-allocation features
  • Storage organized in "extents"
  • Use db.coll.validate(true) to see allocation info
  • Collections are stored in extents as linked lists
  • Indices use extents as BTREE
  • Journaling is for fast crash recovery (not for durability)
  • db.stats(); db.coll.stats();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment