http://www.ninechapter.com/course/2/
##Basics
- Evolution of taobao
- Remove IOE
- Add memcached and hadoop
- Add CDN
- NoSQL
- get(key)
- put(key, value)
- or CQL in Cassandra
- Table
- Column families in Cassandra, “Table” in HBase, “Collection” in MongoDB
- Don’t always support joins or have foreign key
- Still indexed
- Unstructured / no shema / sometimes column missing / no foreign keys
- Column store
- easy to get
- faster range search
- CAP
- Cloud and AWS
- Sharding vs. distributed
##Design Memcached
##Design topK and tiny URL
##Design a key-value store ###Cassandra ####Replication Strategy
- SimpleStrategy - RandomPartitioner: Chord like hash partitioning - ByteOrderedPartitioner: Assigns ranges of keys to servers
- NetworkTopologyStrategy: for multi DC deployments
- Two replicas per DC
- Three replicas per DC
- Per DC
- First replica placed according to Partitioner
- Then go clockwise around ring until you hit a different rack
####Snitches
- Maps: IPs to racks and DCs. Configured in cassandra.yamlconfigfile
- Some options:
- SimpleSnitch: Unaware of Topology (Rack-unaware)
- RackInferring: Assumes topology of network by octet of server’s IP address
101.201.301.401 = x.<DC octet>.<rack octet>.<node octet> - PropertyFileSnitch: uses a configfile
- EC2Snitch: uses EC2
- EC2 Region = DC
- Availability zone = rack
- Other snitch options available
####Writes
- Client sends write to one coordinator node in Cassandra cluster
- Coordinator may be per-key, per-client, or per-query
- Per-key Coordinator ensures writes for the key are serialized
- Coordinator uses Partitionerto send query to all replica nodes responsible for key
- When X replicas respond, coordinator returns an acknowledgement to the client
- Always writable: Hinted Handoff mechanism
- If any replica is down, the coordinator writes to all other replicas, and keeps the write locally until down replica comes back up.
- When all replicas are down, the Coordinator (front end) buffers writes (for up to a few hours).
- One ring per data center
- Per-DC coordinator elected to coordinate with other DCs
- Election done via Zookeeper, which runs a Paxos(consensus) variant
- Workflow
- Once a write comes, Log it in disk commit log (for failure recovery)
- Make changes to appropriate memtables
- when memtableis full or old, flush to disk as SSTable
- Index file and add Bloom filter
####Compaction and Delete
- Delete: don’t delete item right away; add a tombstone to the log; when compaction delete it
####Reads
- Coordinator can contact X replicas (e.g., in same rack)
- A row may be split across multiple SSTables=> reads need to touch multiple SSTables=> reads slower than writes
####Membership
- Any server in cluster could be the coordinator
- So every server needs to maintain a list of all the other servers that are currently in the server
- List needs to be updated automatically as servers join, leave, and fail
- Cassandra uses gossip-based cluster membership
####vs. RDBMS
- On > 50 GB data
- MySQL
- Writes 300 msavg
- Reads 350 msavg
- Cassandra
- Writes 0.12 msavg
- Reads 15 msavg
- MySQL
##Design a CDN
##Design a mobile application