phund · December 21, 2016 09:55
diff --git a/gistfile1.txt b/gistfile1.txt
 Main Reason
 HA - High Availability
 Data Safety
 General Info
 Replication is Async
 And Can have ACK
 Single Primary (No direct Eventual Consistency)
 Has somewhat Statement Based Replication (Not Binary Based)
 	It creates one statement for document even if your query is single line (But it may affect multiple documents)

 Replica Sets
 Automatic Fail Over 
 assign new leader when leader goes down
 And Client libraries will talk to new primary
 This process takes 10 secs. (Only for writes)
 Since read can be done with any node we don’t have downtime for reads
 Automatic Recovery
 Primary
 Rollback commits which does not send to the secondary servers
 Archive those commits
 Get new commits from the new master 
 Become a Secondary and join the Replica Set
 Secondary
 Take new commits which does not recieved from master
 Join the Replica Set as a Seconday

 Creating an Replica Set
 Best Practises
 don’t use raw IP addresses (we’ll using it below for ease of use)
 don’t use names from /etc/hosts
 use DNS
 pick  a good TTL

 Creating Replica Set Process (Unix)
 Start mongod 

 cd /tmp
 mkdir mongo && cd mongo
 mkdir 1 2 3
 mongod --port 27001 --replSet abc --logpath 1.log --logappend --fork --dbpath 1 --rest
 mongod --port 27002 --replSet abc --logpath 2.log --logappend --fork --dbpath 2 --rest
 mongod --port 27003 --replSet abc --logpath 3.log --logappend --fork --dbpath 3 --rest

 Config it
 mongo --port 27001
 var cfg = {
 	"_id" : "abc",
 	"members" : [
 		{
 			"_id" : 0,
 			"host" : "127.0.0.1:27001"
 		},
 		{
 			"_id" : 1,
 			"host" : "127.0.0.1:27002"
 		},
 		{
 			"_id" : 2,
 			"host" : "127.0.0.1:27003"
 		}
 	]
 }
 rs.initiate(cfg)

 Look for stats
 rs.status()

 Write Some to Primary and see from others
 In the master write as normal
 But in the slave we need do following to allow us to read
 Since, writes are async from master to secondary there could eventually consistent data
 rs.slaveOk()


 Kill Primary
 This is a hard kill (that should not be used in normal scenarios)
 kill -9 <pid of the mongod>

 Check status from others
 rs.status() 
 seems like killed mongo host blamed with no connection
 Write from new primary
 Write normally from the new master
 Reload killed machine
 mongod --port 27001 --replSet abc --logpath 1.log --logappend --fork --dbpath 1
 See Replica Set Status
 mongo --port 27001
 rs.status()
 This become a slave since we are having a good master
 And this will sync with the master
 Monitoring / Status
 Mongodb starts an UI for each db instance at port:= dbPort + 1000
 It as an UI where we can have replica-set info too.

 Optime
 Special sequence number in mongo that uniquely logs, primary db’s write operations. It is called optime. 
 It has following format (like a tupple)
 (<32 bit timestamp resoluted to seconds>,<32 bit counter>)

 It describe,  that this the nth write operation for this particular second.

 You can this by the admin UI or applying rs.status()=
 Oplog
 logs all the write operations occurred at the leader
 these log used to transfer operations into secondary instances
 If the set in sync, all the db instance have same log ending optime 
 	since this log stored in a cap collection starting log could be change
 Normally in production this takes 5% of the disk space and it is recommended.

 To view this you can click optime in the following table of the UI



 Or find in the oplog collection
 use local
 db.oplog.rs.find()

 db.printReplicationInfo() also gives a breif summery of the oplog





 Configuration

 For the reconfig we need majority for that. If there are multiple versions
 How to
 var cfg = rs.config();
 cfg.members[2].priority = 0;
 cfg.members[2].hidden = true;
 rs.reconfig(cfg);

 Options
 Priority (priority)
 This is the priority for becoming an PRIMARY. 
 When used 0 never becomes PRIMARY
 Hidden (hidden)
 Hidden from the clients
 Need to be priority = 0
 db.isMaster() does not show this.
 Votes (votes)
 Can be used to set voting numbers when electing PRIMARY.
 Default is 1.
 Never use this
 Slave Delay (slaveDelay)
 value specify as seconds
 delays updating from PRIMARY with the given amout of delay
 can be use as a temp. backup if we use higher numbers
 Need to be priority = 0
 It is automatically hidden too
 ArbiterOnly (arbiterOnly)
 Do not store data
 Only used to elect PRIMARY
 Only need small server

 Tags
 Can set a document with custom tags. Useful to work with write concern. (getLastError)
 eg:- 

 conf = rs.conf()
 conf.members[0].tags = { "dc": "east", "use": "production"  }
 conf.members[1].tags = { "dc": "east", "use": "reporting"  }
 conf.members[2].tags = { "use": "production"  }
 rs.reconfig(conf)



 Write Concern
 Check whether our writes have been done correctly.
 Checks with db.getLastError(<w value>, <wtimeout>)

 Strategies
 Use w=3 (assume replication factor is 3) - checks for each node in the replSet
 Use w:majority - checks for the majority in the set
 Use w:1
 Use nth pattern 
 if we have 100000 inserts
 check getLastError with majority for every nth
 check for at last
 Use at last and first
 do nothing

 Reasons
 Very Critical and we need everything in all the node 
 eg:- Banking
 For most write concern apps, that we know it has been committed to the replSet
 We can assume it is sent to the primary and can also for some write concern apps
 Just to check whether we send data to replSet 
 if we have a socket open
 and nth does well
 we can assume
 Same as above with even more less concern
 We simply don’t care


 Capacity Planning
 MongoD has limits on connection per node (PRIMARY in this case)
 so need to aware of that
 and set connection pools in the app’s looking at it

 Monitoring
 We can monitor this with getLastError with nth pattern - how much it takes to get there
 Obviously we can use a timeout too

 It’s wise to do this before we are in production

 Limitations
 No more than 12 members
 No more than 7 voters
 Read Preference
 Following options are valid for new client drivers who are replicaSet aware
 mongos also support this
 these are only available after v2.2


 Options
 Reason
 primary
 We real consistence reads with writes
 primaryPreferred
 Eventually consistent is OK for when primary is down (it takes ~10 secs to elect a new primary)
 secondary
 If it is okay with eventually consistent reads
 For offline workload (like analytics)
 Geographical separation
 Separation on work load

 secondaryPrefered
 High availability 
 But in the worst case scenario, all the queries will send to primary and need to deal with it 
 nearest
 Spread workload, speed reads (reduce network latency)
	Main Reason
	HA - High Availability
	Data Safety
	General Info
	Replication is Async
	And Can have ACK
	Single Primary (No direct Eventual Consistency)
	Has somewhat Statement Based Replication (Not Binary Based)
	It creates one statement for document even if your query is single line (But it may affect multiple documents)

	Replica Sets
	Automatic Fail Over
	assign new leader when leader goes down
	And Client libraries will talk to new primary
	This process takes 10 secs. (Only for writes)
	Since read can be done with any node we don’t have downtime for reads
	Automatic Recovery
	Primary
	Rollback commits which does not send to the secondary servers
	Archive those commits
	Get new commits from the new master
	Become a Secondary and join the Replica Set
	Secondary
	Take new commits which does not recieved from master
	Join the Replica Set as a Seconday

	Creating an Replica Set
	Best Practises
	don’t use raw IP addresses (we’ll using it below for ease of use)
	don’t use names from /etc/hosts
	use DNS
	pick a good TTL

	Creating Replica Set Process (Unix)
	Start mongod

	cd /tmp
	mkdir mongo && cd mongo
	mkdir 1 2 3
	mongod --port 27001 --replSet abc --logpath 1.log --logappend --fork --dbpath 1 --rest
	mongod --port 27002 --replSet abc --logpath 2.log --logappend --fork --dbpath 2 --rest
	mongod --port 27003 --replSet abc --logpath 3.log --logappend --fork --dbpath 3 --rest

	Config it
	mongo --port 27001
	var cfg = {
	"_id" : "abc",
	"members" : [
	{
	"_id" : 0,
	"host" : "127.0.0.1:27001"
	},
	{
	"_id" : 1,
	"host" : "127.0.0.1:27002"
	},
	{
	"_id" : 2,
	"host" : "127.0.0.1:27003"
	}
	]
	}
	rs.initiate(cfg)

	Look for stats
	rs.status()

	Write Some to Primary and see from others
	In the master write as normal
	But in the slave we need do following to allow us to read
	Since, writes are async from master to secondary there could eventually consistent data
	rs.slaveOk()


	Kill Primary
	This is a hard kill (that should not be used in normal scenarios)
	kill -9 <pid of the mongod>

	Check status from others
	rs.status()
	seems like killed mongo host blamed with no connection
	Write from new primary
	Write normally from the new master
	Reload killed machine
	mongod --port 27001 --replSet abc --logpath 1.log --logappend --fork --dbpath 1
	See Replica Set Status
	mongo --port 27001
	rs.status()
	This become a slave since we are having a good master
	And this will sync with the master
	Monitoring / Status
	Mongodb starts an UI for each db instance at port:= dbPort + 1000
	It as an UI where we can have replica-set info too.

	Optime
	Special sequence number in mongo that uniquely logs, primary db’s write operations. It is called optime.
	It has following format (like a tupple)
	(<32 bit timestamp resoluted to seconds>,<32 bit counter>)

	It describe, that this the nth write operation for this particular second.

	You can this by the admin UI or applying rs.status()=
	Oplog
	logs all the write operations occurred at the leader
	these log used to transfer operations into secondary instances
	If the set in sync, all the db instance have same log ending optime
	since this log stored in a cap collection starting log could be change
	Normally in production this takes 5% of the disk space and it is recommended.

	To view this you can click optime in the following table of the UI



	Or find in the oplog collection
	use local
	db.oplog.rs.find()

	db.printReplicationInfo() also gives a breif summery of the oplog





	Configuration

	For the reconfig we need majority for that. If there are multiple versions
	How to
	var cfg = rs.config();
	cfg.members[2].priority = 0;
	cfg.members[2].hidden = true;
	rs.reconfig(cfg);

	Options
	Priority (priority)
	This is the priority for becoming an PRIMARY.
	When used 0 never becomes PRIMARY
	Hidden (hidden)
	Hidden from the clients
	Need to be priority = 0
	db.isMaster() does not show this.
	Votes (votes)
	Can be used to set voting numbers when electing PRIMARY.
	Default is 1.
	Never use this
	Slave Delay (slaveDelay)
	value specify as seconds
	delays updating from PRIMARY with the given amout of delay
	can be use as a temp. backup if we use higher numbers
	Need to be priority = 0
	It is automatically hidden too
	ArbiterOnly (arbiterOnly)
	Do not store data
	Only used to elect PRIMARY
	Only need small server

	Tags
	Can set a document with custom tags. Useful to work with write concern. (getLastError)
	eg:-

	conf = rs.conf()
	conf.members[0].tags = { "dc": "east", "use": "production" }
	conf.members[1].tags = { "dc": "east", "use": "reporting" }
	conf.members[2].tags = { "use": "production" }
	rs.reconfig(conf)



	Write Concern
	Check whether our writes have been done correctly.
	Checks with db.getLastError(<w value>, <wtimeout>)

	Strategies
	Use w=3 (assume replication factor is 3) - checks for each node in the replSet
	Use w:majority - checks for the majority in the set
	Use w:1
	Use nth pattern
	if we have 100000 inserts
	check getLastError with majority for every nth
	check for at last
	Use at last and first
	do nothing

	Reasons
	Very Critical and we need everything in all the node
	eg:- Banking
	For most write concern apps, that we know it has been committed to the replSet
	We can assume it is sent to the primary and can also for some write concern apps
	Just to check whether we send data to replSet
	if we have a socket open
	and nth does well
	we can assume
	Same as above with even more less concern
	We simply don’t care


	Capacity Planning
	MongoD has limits on connection per node (PRIMARY in this case)
	so need to aware of that
	and set connection pools in the app’s looking at it

	Monitoring
	We can monitor this with getLastError with nth pattern - how much it takes to get there
	Obviously we can use a timeout too

	It’s wise to do this before we are in production

	Limitations
	No more than 12 members
	No more than 7 voters
	Read Preference
	Following options are valid for new client drivers who are replicaSet aware
	mongos also support this
	these are only available after v2.2


	Options
	Reason
	primary
	We real consistence reads with writes
	primaryPreferred
	Eventually consistent is OK for when primary is down (it takes ~10 secs to elect a new primary)
	secondary
	If it is okay with eventually consistent reads
	For offline workload (like analytics)
	Geographical separation
	Separation on work load

	secondaryPrefered
	High availability
	But in the worst case scenario, all the queries will send to primary and need to deal with it
	nearest
	Spread workload, speed reads (reduce network latency)