Cassandra/C*

Key characteristics:

P2P . No concept of zookeeper or named nodes so no single point of failure. Every node in the cluster has the same responsibility and no node gets any special treatment.
Adding more m/c's means higher number of opeartions/sec

Best practices/Pre-requisites for a C* install:

NTP for synchronized clocks across nodes
Disable swap via sudo swapoff -all
Ensure that the cluster name and seeds are set correctly. Seed information is used when joining a node to cluster.

Essential C* configurations/What goes where:

cassandra.yaml - most of the configuration about the node/clutster are contained in this file
cassandra_env.sh - properties regarding setting up JAVA env variables
logback.xml - C* logging levels
cassandra-rackdc.props - useful when installing C* across DC's . Contains information on how nodes about topology/how nodes know what DC, RACK it is on.

Location of C* configurations:

Name	Description	Default location
commit log	commit log is a append only log to which all the writes are written in addition to memtables. If you are using spinning disk then better to have commit log mounted on a separate disk to ensure that this append only log can be written to as quickly as possible. Also this is the first place where the writes become durable.	/var/lib/cassandra/commitlog or $CASSANDRA_HOME/data/commitlog
data file directories	this is where the SSTables live. SSTables are reqad optimized	/var/lib/cassandra/data or $CASSANDRA_HOME/data/data
saved directories	storage directory for commonly read key and row caches	/var/lib/cassandra/saved_caches or $CASSANDRA_HOME/data/saved_caches

How to pick MAX_HEAP_SIZE property for C*

System memory	MAX_HEAP_SIZE
< 2GB	1/2 of system memory
> 4GB but < 2GB	1GB
> 4GB (should be always the case)	1/4 of the sytem memory but not greater than 8GB

Cassandra basics

What is a C* cluster?

A C* cluster is 1..n nodes. All nodes under a cluster have the same cluster name in cassandra yaml and should be able to indentify themselves using seeds provided.

Node - 1 cassandra instance
Rack - Logical set of nodes
Data center - Logical set of racks

[Best practice] 1 node from per DC should be used when definining the seeds for a cluster

What is a coordinator?

TODO

What are vnodes?

There are two ways of assigning tokens to a cassandra ring:

Single token per node : Each node gets one token and is responsible for data between that token and token before that.
- Hard to add/remove nodes with this strategy as adding node will mean unbalanced nodes
Multiple tokens assigned per node : 256 tokens are divided across the entire ring. As a result each node gets mutliple tokens.
- Makes it easier to add/remove nodes without making the cluster unbalanced.
- num_tokens in cassandra.yaml controls how many tokens get divided over the ring. Default value is 256

Replication

What is a replication factor?

RF is the total number of times a write will be replicated in the cluster. A RF of 3 means there will be total of 3 copies of a write in a cluster.

RF is defined at a ** per kespace ** level and ideally should not be changed.
If a cluster is spread across DC's then RF can be defined per keyspace per DC. For e.g. replicate for this keyspace 2 times in east coast and 3 times in west coast cluster.

Different replication strategies

Each node has a cassandra_rackdc.properties which is used to identify what rack/DC a node belongs to.
RF is set per keyspace per DC and is defined at the time of keyspace creation.
Replication strategy is set per keyspace and is defined at the time of keyspace creation.
SimpleStrategy - If you only are going to use 1 DC
NetworkTopolgyStrategy
- When using this and replicating the data within a DC across nodes make sure that the nodes are not in the same rack.
- Endpoint snitch/cassandra_rackdc.props can be used for this to identify rack and DC location of a node.
** Also notice that there is no concept of primary and slave when writing and replicating data. All nodes are treated eqallly.**
When replication across a remote cluster similar to local cluster, a coordinator is picked on the DC and it is resposible for replicating the data.

What is consistency level?

Defines how many nodes will have to agree for a given read write operation to succeed.

Can be defined per operation.
Directly affects the speed of the read/write operation.
A CL = 1 means than just one of the replica needs to provide a copy of the data.
A CL = QUORUM means that majority ( RF/2 + 1) of the replicas will have to agree.
A CL = LOCAL_QUORUM means that majority ( RF/2 + 1) of the replicas in a DC will have to agree.

What is hinted handoff?

Assume RF = 3 and CL = 2. When this data is written once 2 nodes ack, the client will be told success. But the coordinator still need to replicate the data to the third node.If that 3rd node is dead or not responding coordinator will write a copy of this data to system.hints table and will try to stream is over once the node is back.

Same concept when DC goes offline
[Best practice] Need to actively monitor this statistic

What is eventual consistency?

If you are reading/writing at a CL < RF there is a chance that all the replicas have not caught up and you might get stale data. But eventually the replicas will catch up and all the replicas will have the same view of the data.

A read immediately follwed by a write is considered to be an anti pattern in C*, but there are ways to achieve it.

** Using below mentioned CL levels for these reads and writes **

WRITE	READ	Comments
CL = ALL	CL = 1	If you are writing with CL = ALL that means all the replicas already must have the data for the write to succeed. So the read can use CL = 1 and can read from any node.
CL = QUORUM	CL = QUORUM	You are always reading and writing to majority
CL = 1	CL = ALL	You write to one but you read from all to make sure that all the nodes agree.

** Pick one depending on whether its a read heavy or a write heavy operation. But always check if the value of immediate consistency is worth the latency. **

[** Best pracitce **] CL=1 is your friend but use it carefully.

Light weight transaction with conditional queries

** Use cases for read immediately followed by write **

Cost/Benefits of different CL

CL = 1	CL = QUORUM	CL = ALL	Notes
Lowest latency	Higher latency	Highest latency	Lower is better
Highest throughput	Lower throughput	Lowest throughput	Higher is better
Highest availability	Lower availability	Lowest availabily	High means cluster will keep operating even if nodes keep going down
Stale reads possible	No stales reads if both read/write at quorum	No stale reads if either read or write is at CL = ALL

TODO

What is a commit log?
What is a memtables?
What are SSTables
Common nodetool commands
Common cqlsh commands
Operations

How does read repair work?
How to take down a node?
- What metrics to observe?
- What can possibly go wrong ?
How to bring up a node to join the cluster
- What metrics to observe?
- What can possibly go wrong ?
What do do when a node goes down?
- What metrics to look out for?

atinsood/cassandra_notes.md

Cassandra/C*

Key characteristics:

Best practices/Pre-requisites for a C* install:

Essential C* configurations/What goes where:

How to pick MAX_HEAP_SIZE property for C*

Cassandra basics

What is a C* cluster?

What is a coordinator?

What are vnodes?

Replication

What is a replication factor?

Different replication strategies

What is consistency level?

What is hinted handoff?

What is eventual consistency?

Cost/Benefits of different CL

TODO