Distributed Systems

Lesson 1 - Introduction

Systems today are generally distributed, both in-house and third-party systems
Stream processing is batch processing but in smaller increments, it isn't real-time, but is quick
Project, work as pre and post work
Reliability
Scalability
Maintainability

8,465.38

2^10 1000
how much ram addressable in 32 bits: 2^32 4GB
11 Base64 characters: 2^6 6 bits of information 66 bits total
byte: 8 bits, register: 32 or 64 bit, cache line: 32 bytes or 64 bytes, disk sector: 4kb block?
1 clock cycle per instruction, L1: .7 ns, L2:
threads share memory and processes get their own. Thread can be more flexibly used, process is an active program
can't share process memory, gets its own virtual memory

Language specific serialization (marshall, pickle, etc...): Can be insecure, can be slow, also assumes you are serializing according to language specs
Need to consider backwards compatability (new version reads old schema) and forward compatability (old version reads new schema)
Does communication have with textual or binary format?
- Textual: no data validation, larger size, but easier

single leader replication:
- less latency
- durable/availability/fault-tolerance
- throughput
Simple solution: primary and replica db
Synchronous transactions: replica is written, then transaction is considered ok
Asynchronous transactions: leader is written, transaction has succeeded, replicas then write asynchronously
ship statements to replicas: This works, but breaks down with non-deterministic functions like random or time functions
ship logs:
- write-ahead log: write to the log then make the transaction
- these logs can just be shipped to the replicas and replayed on them
- write ahead log is on disk, why write to it on disk, it writes sequentially
- data shipping, build a logical representation of the statements
synchronous replication: takes a long time, increases load, when they run slowly transactions take forever.
semi-synchronous replication: Wait for only 1 or some of the replicas to write and then consider that successful

Partitioning spreads out data, this can be independent of replication.
Partition boundaries are chosen, can be key-based. This is similar to the alphabetical index of an encyclopedia
The partition boundaries can be fixed and manually set or rebalanced automatically
key range can lead to hot spots, e.g. timestamp keys could mean the most recent ones are used
Hash-based partitioning: Use a hash on the key and partition based on that
- A disadvantage of this is range-based queries are more spread out
Keys that are hot spots can have a random number appended or prepended to it, this can alleviate some of the issues of hot spots. Usually is selectively assigned as it will spread out related data.
Secondary indexes: indexes on the partition itself
Rebalancing: a good strategy to prepare for more nodes being added later is to add more partitions to the nodes than what is needed.
service discovery: how does a client find which node has the data? It can be handled by adding the mapping to the actual nodes, having a routing table, or having the client have knowledge of the table.

Kademlia: early P2P distributed hash table https://en.wikipedia.org/wiki/Kademlia 1 hop per bit
DB Transactions should be isolated from each other
Queries should not see in-flight data before it's commited
ACID: Atomicity, Consistency, Isolation, and Durability: Sometimes loose-ly defined but the guarantee by some databases
Write-ahead log helps acheive atomicity and durability
The standard is serializable isolation:
- Fully serial is don't start one transaction until another one happens
- Locking can help appear fully serial, the lock manager does "two-phase" locking, which tracks read and write locks. Acquire locks then distribute them.
- Locking usually happens on granular level: lock table, row, pages (disk chunks), on change as transcation happens
- Can be slow, either getting too many locks or a slow queue due to large amount of locks
Deadlocking is when two locks happen and both are waiting on each other
database isolation, need to retry transactions sometimes (most ORMs don't)
Database need to be set to serializiable
Read committed means that only commmited data can be read: no dirty reads or dirty writes
Phantom read: Brand new data was inserted into a range you were depending on. Can happened during read committed
"snapshot isolation": store all the data and only expose

linearizability: loosely defined as the system acting as if there is only one copy of the data
http://jepsen.io/ finds faults in distributed systems
CAP theorem:
- Consistency: clients see the same data
- Availability: system continues when nodes are down
- partition tolerant: system continues even in network failures
Can't acheive full linearizability, but may have to give up some availability
Making a fully linearizability will have a lot of coordination overhead and will always be limited by the slowest node
Eventually consistency: At some point the system will be consistent
Casual consistency: The system doesn't need every action to be in sequence, but some need to be consistent within a certain constraint. For instance, each user has their data on one server.
exercise:
linearizable: 2,5
eventual consistency home: 0,1,2,3,4,5 visitors: 0,1,2
consistent prefix:

Could get split brain. Leader replication, leader fails, new leader updates, previous leader gets back up
two-phased commit: Distributed transactions
- first phase: can I commit?
- second phase: actually send message
leader, should we have one coordinator? Could change based on epoch/timestamp/number
nodes can elect a leader
Raft consensus algorithm:
- All nodes start as a leader
- They elect a new leader if they don't hear from one, they can try to be the leader