Mongo Notes from MongoSF 20011

MongoDB Notes

Building Web Apps with MongoDB

MapReduce is a two step process
- reduce function takes output of maps and collections and aggregates then gives you it in a collection
- http://www.mongodb.org/display/DOCS/MapReduce

MongoDB Schema Design

How to start thinking in terms of rich document modeling
mongo makes you feel like you are denormalizing your data, it makes your data feel more object like
object like is a huge gain of mongo
collections is a set of documentions equivalent to a table
NO joins in mongo, but there is embedding
sophisticated query system, not as good as SQL, but pretty decent
all updates are atomic and isolated
Considerations
- no joins
- documents are atomic
mongo id is a bson specific id that is given to you
you get an automatic timestamp as well
You can examine the query plan by using .explain()
- http://www.mongodb.org/display/DOCS/Explain
cool update operators such as puss, pull, pop, etc..
The 'dot' operator
- reach into the fields of the objects
- http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
Modify atomically
- findAndModify allows you to find and modify atomically
have the db conform to the application you are trying to build

MongoDB Performance Tuning by Shutterfly

Shutterfly doesn't have any cloud based stuff they run on their own private servers
traditional to RDBMS environments
Data modeling matters, kind of where you start tuning
General tuning order
- modeling
- statement tuning
- instance tuning
- hardware tuning
- data modeling (http://www.mongodb.org/display/DOCS/MongoDB+Data+Modeling+and+Rails)
Statement tuning
- enable it, leave it on. it is a low overhead
- What to look for?
  - full scans
    - nreturned vs nscanned
  - updates
    - fastmod (fastest)
    - moved (exceeds reserved space in document)
    - key updates (indexes need update)
- explain()
  - use during development
  - use when you find bad operations in profiler
  - db.foo.find().explain()
    - index usage; nscanned vs nreturned
    - nYeilds = waiting for an operation to be completed
    - covered indexes says you can get all data by just reading the index no reason to go to the payload
    - run twice for in memory speed
High performance writes
- Tuning
  - read before write
  - profiler
    - tune for fastmod
  - architectural changes
    - split by collection
    - shard
High performance reads
- cache to disk ratio
  - try to have enough memory in system for your indexes
  - mongostat faults column
- data locality
  - organize data for optimized I/O path. Minimize I/O per query
Tools
- mongostat
  - aggregate instance level information
    - faults: cache misses
    - lock%: tune updates
- mtop
  - good picture of current session level information
- iostat
  - how much physical I/O you are doing?
is it faster to use a single thread for writes?
- yes

MongoDB Shell hacks

shell is spidermonkey
what is it good for?
- debugging
- administration
- scripting glue
- NOT for building apps

Migrating from MySQL to MongoDB by Craigslist

Endianness http://en.wikipedia.org/wiki/Endianness
Shard http://en.wikipedia.org/wiki/Sharding
SAN http://en.wikipedia.org/wiki/Storage_area_network
Lesson: Replica Sets Rock
Lesson: Know your data
- mongodb is utf-8
Lesson: Know your data size
- 4mb in 1.6.x and 16mb in 1.8.x
Lesson: Know some sharding
- balancer can be your frenemy
- initial insert rate: 8000/sec
- http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/

Performance indicators of MongoDB

mongostat
- like iostat
- gives you your virtual size
- provided by a database command called serverStatus
  - db.serverStatus();
- profiler
  - db.setProfilingLevel(2)
    - 2 = any operations (insert, read, write) that takes longer than a certain amount of miliseconds the default is 100
- principals for indexing
  - same as RDBMS
Monitoring service
- Nagios and Munin as well as MMS (Mongo Monitoring service)
Write block percentage
- Concurrency
  - one write OR many readers
web-console
- always have at port 28017 an http page for console info
background flushing
- 10gen tells people to RAID their EBS volumes
connection leaks are sometimes an issue
Network bytes in and out
- important for read heavy applications
Fragmentation
- padding factor
  - you cannot manually set padding factor right now
  - dynamically calculated, the amount of space to leave when you update a new document
Journaling
- recommend having a second spindle just for the journal because syncing to the journal is a little expensive
you can create a secondary index in the background
- can take a secondary index offline and then sync it back up

MongoDB @ foursquare

nginx, Haproxy
mongodb and migrating off of postgres
what we love about mongodb
- fast
- indexes and rich queries
- sharding and auto-balancing
- replication (see http://engineering.foursquare.com/2011/05/24/fun-with-mongodb-replica-sets/)
lessons learned
- keep working set in memory
  - keep indexes in memory
- avoid long-running queries
- monitor everything (per collection stats)
  - application level metrics is always good to monitor
- use small field names for large collections

MongoDB in Ruby

mongo gem and bson gem because bson is the native object
bsonext gem make it a bit faster
all ruby types map to bson types
object ids are NOT strings
MongoMapper recommended over MongoID. There is also Mongomatic

MongoDB in the Cloud

You need to size your replica set as if it were the primary
Typical MongoD should be on a large or extra large standard on demand instance on EC2
Big MongoD should be on extra large, double extra large, quadruple extra large high-memory on-demand instance on EC2
Small instance on EC2 is 32-bit so DO NOT use it
ConfigD/Arbiter can run on a micro instance on EC2
High-CPU Medium is 32-bit so DO NOT use it on EC2. High-CPU in general is just not necessary. More RAM is more important than having more CPU
Operating Systems (Debian, Ubunti, Fedora, Redhat, FreeBSD)
- Turn off atime
- Raise file descriptor limits
  - cat >> /etc/security/limits.conf << EOF
    - hard nofile 65536
    - soft nofile 65536 EOF
- Use ext4, xfs
- DO NOT use large VM pages
- Use RAID
  - RAID10 on MongoD
  - RAID1 on ConfigDB
MongoD on EC2
- LVM or MDADM
- 64-Bit EC2 instance
- stripping = partitions of mirrors
MongoS on EC2
- Runs on Application server
- doesn't need disk, ebs volume, raid
- 32 or 64 bit instance
Arbiter on EC2
- Meant to vote on elections
- Normally need once a week
- Do not run it on the same node as MongoD
- 64 bit EC2 instance, micro or small is fine
ConfigDB on EC2
- LVM or MDADM
- 64 bit EC2 instance micro or small is fine
Deployment scenarios
- 3 - Node replica set
  - 2 large MongoD in US-East one is primary and one is secondary with RAID 10
  - 1 secondary MongoD with priority = 0 (cannot become a primary) in US-West also with RAID 10
why to find out which is the master
- db.is_master?

danishkhan/MongoSF_Notes.md

MongoDB Notes

Building Web Apps with MongoDB

MongoDB Schema Design

MongoDB Performance Tuning by Shutterfly

MongoDB Shell hacks

Migrating from MySQL to MongoDB by Craigslist

Performance indicators of MongoDB

MongoDB @ foursquare

MongoDB in Ruby

MongoDB in the Cloud