Created
December 21, 2016 09:55
-
-
Save phund/34c88296a08e32718a1410e2edcc8a4d to your computer and use it in GitHub Desktop.
Replication in mongodb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Main Reason | |
HA - High Availability | |
Data Safety | |
General Info | |
Replication is Async | |
And Can have ACK | |
Single Primary (No direct Eventual Consistency) | |
Has somewhat Statement Based Replication (Not Binary Based) | |
It creates one statement for document even if your query is single line (But it may affect multiple documents) | |
Replica Sets | |
Automatic Fail Over | |
assign new leader when leader goes down | |
And Client libraries will talk to new primary | |
This process takes 10 secs. (Only for writes) | |
Since read can be done with any node we don’t have downtime for reads | |
Automatic Recovery | |
Primary | |
Rollback commits which does not send to the secondary servers | |
Archive those commits | |
Get new commits from the new master | |
Become a Secondary and join the Replica Set | |
Secondary | |
Take new commits which does not recieved from master | |
Join the Replica Set as a Seconday | |
Creating an Replica Set | |
Best Practises | |
don’t use raw IP addresses (we’ll using it below for ease of use) | |
don’t use names from /etc/hosts | |
use DNS | |
pick a good TTL | |
Creating Replica Set Process (Unix) | |
Start mongod | |
cd /tmp | |
mkdir mongo && cd mongo | |
mkdir 1 2 3 | |
mongod --port 27001 --replSet abc --logpath 1.log --logappend --fork --dbpath 1 --rest | |
mongod --port 27002 --replSet abc --logpath 2.log --logappend --fork --dbpath 2 --rest | |
mongod --port 27003 --replSet abc --logpath 3.log --logappend --fork --dbpath 3 --rest | |
Config it | |
mongo --port 27001 | |
var cfg = { | |
"_id" : "abc", | |
"members" : [ | |
{ | |
"_id" : 0, | |
"host" : "127.0.0.1:27001" | |
}, | |
{ | |
"_id" : 1, | |
"host" : "127.0.0.1:27002" | |
}, | |
{ | |
"_id" : 2, | |
"host" : "127.0.0.1:27003" | |
} | |
] | |
} | |
rs.initiate(cfg) | |
Look for stats | |
rs.status() | |
Write Some to Primary and see from others | |
In the master write as normal | |
But in the slave we need do following to allow us to read | |
Since, writes are async from master to secondary there could eventually consistent data | |
rs.slaveOk() | |
Kill Primary | |
This is a hard kill (that should not be used in normal scenarios) | |
kill -9 <pid of the mongod> | |
Check status from others | |
rs.status() | |
seems like killed mongo host blamed with no connection | |
Write from new primary | |
Write normally from the new master | |
Reload killed machine | |
mongod --port 27001 --replSet abc --logpath 1.log --logappend --fork --dbpath 1 | |
See Replica Set Status | |
mongo --port 27001 | |
rs.status() | |
This become a slave since we are having a good master | |
And this will sync with the master | |
Monitoring / Status | |
Mongodb starts an UI for each db instance at port:= dbPort + 1000 | |
It as an UI where we can have replica-set info too. | |
Optime | |
Special sequence number in mongo that uniquely logs, primary db’s write operations. It is called optime. | |
It has following format (like a tupple) | |
(<32 bit timestamp resoluted to seconds>,<32 bit counter>) | |
It describe, that this the nth write operation for this particular second. | |
You can this by the admin UI or applying rs.status()= | |
Oplog | |
logs all the write operations occurred at the leader | |
these log used to transfer operations into secondary instances | |
If the set in sync, all the db instance have same log ending optime | |
since this log stored in a cap collection starting log could be change | |
Normally in production this takes 5% of the disk space and it is recommended. | |
To view this you can click optime in the following table of the UI | |
Or find in the oplog collection | |
use local | |
db.oplog.rs.find() | |
db.printReplicationInfo() also gives a breif summery of the oplog | |
Configuration | |
For the reconfig we need majority for that. If there are multiple versions | |
How to | |
var cfg = rs.config(); | |
cfg.members[2].priority = 0; | |
cfg.members[2].hidden = true; | |
rs.reconfig(cfg); | |
Options | |
Priority (priority) | |
This is the priority for becoming an PRIMARY. | |
When used 0 never becomes PRIMARY | |
Hidden (hidden) | |
Hidden from the clients | |
Need to be priority = 0 | |
db.isMaster() does not show this. | |
Votes (votes) | |
Can be used to set voting numbers when electing PRIMARY. | |
Default is 1. | |
Never use this | |
Slave Delay (slaveDelay) | |
value specify as seconds | |
delays updating from PRIMARY with the given amout of delay | |
can be use as a temp. backup if we use higher numbers | |
Need to be priority = 0 | |
It is automatically hidden too | |
ArbiterOnly (arbiterOnly) | |
Do not store data | |
Only used to elect PRIMARY | |
Only need small server | |
Tags | |
Can set a document with custom tags. Useful to work with write concern. (getLastError) | |
eg:- | |
conf = rs.conf() | |
conf.members[0].tags = { "dc": "east", "use": "production" } | |
conf.members[1].tags = { "dc": "east", "use": "reporting" } | |
conf.members[2].tags = { "use": "production" } | |
rs.reconfig(conf) | |
Write Concern | |
Check whether our writes have been done correctly. | |
Checks with db.getLastError(<w value>, <wtimeout>) | |
Strategies | |
Use w=3 (assume replication factor is 3) - checks for each node in the replSet | |
Use w:majority - checks for the majority in the set | |
Use w:1 | |
Use nth pattern | |
if we have 100000 inserts | |
check getLastError with majority for every nth | |
check for at last | |
Use at last and first | |
do nothing | |
Reasons | |
Very Critical and we need everything in all the node | |
eg:- Banking | |
For most write concern apps, that we know it has been committed to the replSet | |
We can assume it is sent to the primary and can also for some write concern apps | |
Just to check whether we send data to replSet | |
if we have a socket open | |
and nth does well | |
we can assume | |
Same as above with even more less concern | |
We simply don’t care | |
Capacity Planning | |
MongoD has limits on connection per node (PRIMARY in this case) | |
so need to aware of that | |
and set connection pools in the app’s looking at it | |
Monitoring | |
We can monitor this with getLastError with nth pattern - how much it takes to get there | |
Obviously we can use a timeout too | |
It’s wise to do this before we are in production | |
Limitations | |
No more than 12 members | |
No more than 7 voters | |
Read Preference | |
Following options are valid for new client drivers who are replicaSet aware | |
mongos also support this | |
these are only available after v2.2 | |
Options | |
Reason | |
primary | |
We real consistence reads with writes | |
primaryPreferred | |
Eventually consistent is OK for when primary is down (it takes ~10 secs to elect a new primary) | |
secondary | |
If it is okay with eventually consistent reads | |
For offline workload (like analytics) | |
Geographical separation | |
Separation on work load | |
secondaryPrefered | |
High availability | |
But in the worst case scenario, all the queries will send to primary and need to deal with it | |
nearest | |
Spread workload, speed reads (reduce network latency) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment