- Systems today are generally distributed, both in-house and third-party systems
- Stream processing is batch processing but in smaller increments, it isn't real-time, but is quick
- Project, work as pre and post work
- Reliability
- Scalability
- Maintainability
8,465.38
- 2^10 1000
- how much ram addressable in 32 bits: 2^32 4GB
- 11 Base64 characters: 2^6 6 bits of information 66 bits total
- byte: 8 bits, register: 32 or 64 bit, cache line: 32 bytes or 64 bytes, disk sector: 4kb block?
- 1 clock cycle per instruction, L1: .7 ns, L2:
- threads share memory and processes get their own. Thread can be more flexibly used, process is an active program
- can't share process memory, gets its own virtual memory
- Language specific serialization (marshall, pickle, etc...): Can be insecure, can be slow, also assumes you are serializing according to language specs
- Need to consider backwards compatability (new version reads old schema) and forward compatability (old version reads new schema)
- Does communication have with textual or binary format?
- Textual: no data validation, larger size, but easier
- single leader replication:
- less latency
- durable/availability/fault-tolerance
- throughput
- Simple solution: primary and replica db
- Synchronous transactions: replica is written, then transaction is considered ok
- Asynchronous transactions: leader is written, transaction has succeeded, replicas then write asynchronously
- ship statements to replicas: This works, but breaks down with non-deterministic functions like random or time functions
- ship logs:
- write-ahead log: write to the log then make the transaction
- these logs can just be shipped to the replicas and replayed on them
- write ahead log is on disk, why write to it on disk, it writes sequentially
- data shipping, build a logical representation of the statements
- synchronous replication: takes a long time, increases load, when they run slowly transactions take forever.
- semi-synchronous replication: Wait for only 1 or some of the replicas to write and then consider that successful
- Partitioning spreads out data, this can be independent of replication.
- Partition boundaries are chosen, can be key-based. This is similar to the alphabetical index of an encyclopedia
- The partition boundaries can be fixed and manually set or rebalanced automatically
- key range can lead to hot spots, e.g. timestamp keys could mean the most recent ones are used
- Hash-based partitioning: Use a hash on the key and partition based on that
- A disadvantage of this is range-based queries are more spread out
- Keys that are hot spots can have a random number appended or prepended to it, this can alleviate some of the issues of hot spots. Usually is selectively assigned as it will spread out related data.
- Secondary indexes: indexes on the partition itself
- Rebalancing: a good strategy to prepare for more nodes being added later is to add more partitions to the nodes than what is needed.
- service discovery: how does a client find which node has the data? It can be handled by adding the mapping to the actual nodes, having a routing table, or having the client have knowledge of the table.
- Kademlia: early P2P distributed hash table https://en.wikipedia.org/wiki/Kademlia 1 hop per bit
- DB Transactions should be isolated from each other
- Queries should not see in-flight data before it's commited
- ACID: Atomicity, Consistency, Isolation, and Durability: Sometimes loose-ly defined but the guarantee by some databases
- Write-ahead log helps acheive atomicity and durability
- The standard is serializable isolation:
- Fully serial is don't start one transaction until another one happens
- Locking can help appear fully serial, the lock manager does "two-phase" locking, which tracks read and write locks. Acquire locks then distribute them.
- Locking usually happens on granular level: lock table, row, pages (disk chunks), on change as transcation happens
- Can be slow, either getting too many locks or a slow queue due to large amount of locks
- Deadlocking is when two locks happen and both are waiting on each other
- database isolation, need to retry transactions sometimes (most ORMs don't)
- Database need to be set to serializiable
- Read committed means that only commmited data can be read: no dirty reads or dirty writes
- Phantom read: Brand new data was inserted into a range you were depending on. Can happened during read committed
- "snapshot isolation": store all the data and only expose
- linearizability: loosely defined as the system acting as if there is only one copy of the data
- http://jepsen.io/ finds faults in distributed systems
- CAP theorem:
- Consistency: clients see the same data
- Availability: system continues when nodes are down
- partition tolerant: system continues even in network failures
- Can't acheive full linearizability, but may have to give up some availability
- Making a fully linearizability will have a lot of coordination overhead and will always be limited by the slowest node
- Eventually consistency: At some point the system will be consistent
- Casual consistency: The system doesn't need every action to be in sequence, but some need to be consistent within a certain constraint. For instance, each user has their data on one server.
- exercise:
- linearizable: 2,5
- eventual consistency home: 0,1,2,3,4,5 visitors: 0,1,2
- consistent prefix:
- Could get split brain. Leader replication, leader fails, new leader updates, previous leader gets back up
- two-phased commit: Distributed transactions
- first phase: can I commit?
- second phase: actually send message
- leader, should we have one coordinator? Could change based on epoch/timestamp/number
- nodes can elect a leader
- Raft consensus algorithm:
- All nodes start as a leader
- They elect a new leader if they don't hear from one, they can try to be the leader