Mongo petset

MongoDB is document database that supports range and field queries.

Replication

A single server can run either standalone or as part of a replica set. A "replica set" is set of mongod instances with 1 primary. Primary: receives writes, services reads. Can step down and become secondary. Secondary: replicate the primary's oplog. If the primary goes down, secondaries will hold an election. Arbiter: used to achieve majority vote with even members, do not hold data, don't need dedicated nodes. Never becomes primary.

Replication is asynchronous. Failover: If a primary doesn't communicate with the others for > 10s, secondaries conduct election. Roles:

Arbiter: Only votes, holds no data. Don't deploy more than 1 per replica set.
Priority: Priority 0 members cannot trigger elections, cannot become primary. Can service reads and vote.
Hidden: just like priority 0 but cannot service reads, only vote. Does maintain a copy of master data.
Delayed: just like hidden but records master copies with a delay to avoid eg: human error.

Fault tolerance

Number of memebers that can become unavailable and the cluster can still elect primary. 50 memebers, 7 voting members => 46 can go down (but only 3 of the voting members). WAN deployment: 1 member per DC in 3 DCs, can tolerate a single DC going down.

Configuration

Write concern: requests ack only from primary, overwrite per write operation to specify number of secondaries. Read concern: local/majority. Local means read from primary, majority might read from secondaries. OpLog size: depends on storage engine, 3 types: in-memory, wired tiger, mmapv1.

Failover

New members or secondaries that fall behind too far must resync everything. Starting mongo with an empyt datadir will force an initial sync. Starting it with a copy of a recent datadir from another member in the set will also hasten the initial sync.

Changing hostnames

Change hostnames of all secondaries, wait till they catch up, ask master to step down, bounce clients.
Stop all members, reconfigure offline using same datadir but different port (so clients can't connect), write revised db config, start new hostnames normal way.

2 problems

rollbacks - network partition, secondary can't keep up with primary, primary goes down, stale secondary becomes master, master rejoins as primary -- master needs to rollback writes it accepted. Such a rollback will not happen if the write propogates to a healthy reachable secondary, because it will become master. rebooting 2 secondaries simultaneously in a 3 member replica set forces the primary to step down, meaning it closes all sockets (Connection reset by peer) till one of the secondaries becomes available. false elections