Skip to content

Instantly share code, notes, and snippets.

@phund
Created December 21, 2016 09:55
Show Gist options
  • Save phund/34c88296a08e32718a1410e2edcc8a4d to your computer and use it in GitHub Desktop.
Save phund/34c88296a08e32718a1410e2edcc8a4d to your computer and use it in GitHub Desktop.
Replication in mongodb
Main Reason
HA - High Availability
Data Safety
General Info
Replication is Async
And Can have ACK
Single Primary (No direct Eventual Consistency)
Has somewhat Statement Based Replication (Not Binary Based)
It creates one statement for document even if your query is single line (But it may affect multiple documents)
Replica Sets
Automatic Fail Over
assign new leader when leader goes down
And Client libraries will talk to new primary
This process takes 10 secs. (Only for writes)
Since read can be done with any node we don’t have downtime for reads
Automatic Recovery
Primary
Rollback commits which does not send to the secondary servers
Archive those commits
Get new commits from the new master
Become a Secondary and join the Replica Set
Secondary
Take new commits which does not recieved from master
Join the Replica Set as a Seconday
Creating an Replica Set
Best Practises
don’t use raw IP addresses (we’ll using it below for ease of use)
don’t use names from /etc/hosts
use DNS
pick a good TTL
Creating Replica Set Process (Unix)
Start mongod
cd /tmp
mkdir mongo && cd mongo
mkdir 1 2 3
mongod --port 27001 --replSet abc --logpath 1.log --logappend --fork --dbpath 1 --rest
mongod --port 27002 --replSet abc --logpath 2.log --logappend --fork --dbpath 2 --rest
mongod --port 27003 --replSet abc --logpath 3.log --logappend --fork --dbpath 3 --rest
Config it
mongo --port 27001
var cfg = {
"_id" : "abc",
"members" : [
{
"_id" : 0,
"host" : "127.0.0.1:27001"
},
{
"_id" : 1,
"host" : "127.0.0.1:27002"
},
{
"_id" : 2,
"host" : "127.0.0.1:27003"
}
]
}
rs.initiate(cfg)
Look for stats
rs.status()
Write Some to Primary and see from others
In the master write as normal
But in the slave we need do following to allow us to read
Since, writes are async from master to secondary there could eventually consistent data
rs.slaveOk()
Kill Primary
This is a hard kill (that should not be used in normal scenarios)
kill -9 <pid of the mongod>
Check status from others
rs.status()
seems like killed mongo host blamed with no connection
Write from new primary
Write normally from the new master
Reload killed machine
mongod --port 27001 --replSet abc --logpath 1.log --logappend --fork --dbpath 1
See Replica Set Status
mongo --port 27001
rs.status()
This become a slave since we are having a good master
And this will sync with the master
Monitoring / Status
Mongodb starts an UI for each db instance at port:= dbPort + 1000
It as an UI where we can have replica-set info too.
Optime
Special sequence number in mongo that uniquely logs, primary db’s write operations. It is called optime.
It has following format (like a tupple)
(<32 bit timestamp resoluted to seconds>,<32 bit counter>)
It describe, that this the nth write operation for this particular second.
You can this by the admin UI or applying rs.status()=
Oplog
logs all the write operations occurred at the leader
these log used to transfer operations into secondary instances
If the set in sync, all the db instance have same log ending optime
since this log stored in a cap collection starting log could be change
Normally in production this takes 5% of the disk space and it is recommended.
To view this you can click optime in the following table of the UI
Or find in the oplog collection
use local
db.oplog.rs.find()
db.printReplicationInfo() also gives a breif summery of the oplog
Configuration
For the reconfig we need majority for that. If there are multiple versions
How to
var cfg = rs.config();
cfg.members[2].priority = 0;
cfg.members[2].hidden = true;
rs.reconfig(cfg);
Options
Priority (priority)
This is the priority for becoming an PRIMARY.
When used 0 never becomes PRIMARY
Hidden (hidden)
Hidden from the clients
Need to be priority = 0
db.isMaster() does not show this.
Votes (votes)
Can be used to set voting numbers when electing PRIMARY.
Default is 1.
Never use this
Slave Delay (slaveDelay)
value specify as seconds
delays updating from PRIMARY with the given amout of delay
can be use as a temp. backup if we use higher numbers
Need to be priority = 0
It is automatically hidden too
ArbiterOnly (arbiterOnly)
Do not store data
Only used to elect PRIMARY
Only need small server
Tags
Can set a document with custom tags. Useful to work with write concern. (getLastError)
eg:-
conf = rs.conf()
conf.members[0].tags = { "dc": "east", "use": "production" }
conf.members[1].tags = { "dc": "east", "use": "reporting" }
conf.members[2].tags = { "use": "production" }
rs.reconfig(conf)
Write Concern
Check whether our writes have been done correctly.
Checks with db.getLastError(<w value>, <wtimeout>)
Strategies
Use w=3 (assume replication factor is 3) - checks for each node in the replSet
Use w:majority - checks for the majority in the set
Use w:1
Use nth pattern
if we have 100000 inserts
check getLastError with majority for every nth
check for at last
Use at last and first
do nothing
Reasons
Very Critical and we need everything in all the node
eg:- Banking
For most write concern apps, that we know it has been committed to the replSet
We can assume it is sent to the primary and can also for some write concern apps
Just to check whether we send data to replSet
if we have a socket open
and nth does well
we can assume
Same as above with even more less concern
We simply don’t care
Capacity Planning
MongoD has limits on connection per node (PRIMARY in this case)
so need to aware of that
and set connection pools in the app’s looking at it
Monitoring
We can monitor this with getLastError with nth pattern - how much it takes to get there
Obviously we can use a timeout too
It’s wise to do this before we are in production
Limitations
No more than 12 members
No more than 7 voters
Read Preference
Following options are valid for new client drivers who are replicaSet aware
mongos also support this
these are only available after v2.2
Options
Reason
primary
We real consistence reads with writes
primaryPreferred
Eventually consistent is OK for when primary is down (it takes ~10 secs to elect a new primary)
secondary
If it is okay with eventually consistent reads
For offline workload (like analytics)
Geographical separation
Separation on work load
secondaryPrefered
High availability
But in the worst case scenario, all the queries will send to primary and need to deal with it
nearest
Spread workload, speed reads (reduce network latency)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment