Time clock and the ordering of events in a distributed system
Consensus on Transaction Commit
An empirical study on the correctness of formally verified distributed systems
Byzantizing Paxos by Refinement
Impossibility of Distributed Consensus with One Faulty Process
an asynchronous consensus algorithm cannot be guaranteed to be both safe and live. This is called the FLP Impossibility Result
Blockchains from a Distributed Computing Perspective ht
Seeing is Believing: A Client-Centric Specification of Database Isolation
CASPaxos: Replicated State Machines without logs
On Designing and Deploying Internet-Scale Services
Composition: A Way to Make Proofs Harder mentioned here
Because serializability allows arbitrary reordering of operations (so long as the order appears atomic), it is not particularly useful in real applications. Most databases which claim to provide serializability actually provide strong serializability, which has the same time bounds as linearizability. To complicate matters further, what most SQL databases term the SERIALIZABLE consistency level actually means something weaker, like repeatable read, cursor stability, or snapshot isolation.
Consistency means linearizability, and in particular, a linearizable register.
Serializability, linearizability, and locality
Difference between Linearizability and Serializability
Linearizability and Serializability in context of Software Transactional Memory
Linearizability versus Serializability
Serializability is the traditional “I,” or isolation, in ACID. If users’ transactions each preserve application correctness (“C,” or consistency, in ACID), a serializable execution also preserves correctness. Therefore, serializability is a mechanism for guaranteeing database correctness.1
Granted, serializability is (more or less) the most general means of maintaining database correctness. In what’s becoming one of my favorite “underground” (i.e., relatively poorly-cited) references, H.T. Kung and Christos Papadimitriou dropped a paper in SIGMOD 1979 on “An Optimality Theory of Concurrency Control for Databases.” In it, they essentially show that, if all you have are transactions’ syntactic modifications to database state (e.g., read and write) and no information about application logic, serializability is, in some sense, “optimal”: in effect, a schedule that is not serializable might modify the database state in a way that produces inconsistency for some (arbitrary) notion of correctness that is not known to the database.
Combining serializability and linearizability yields strict serializability: transaction behavior is equivalent to some serial execution, and the serial order corresponds to real time. For example, say I begin and commit transaction T1, which writes to item x, and you later begin and commit transaction T2, which reads from x. A database providing strict serializability for these transactions will place T1 before T2 in the serial ordering, and T2 will read T1’s write. A database providing serializability (but not strict serializability) could order T2 before T1.2
As Herlihy and Wing note, “linearizability can be viewed as a special case of strict serializability where transactions are restricted to consist of a single operation applied to a single object.”
Linearizability, serializability, transaction isolation and consistency models
Please stop calling databases CP or AP
Moreover, databases with snapshot isolation/MVCC are intentionally non-linearizable, because enforcing linearizability would reduce the level of concurrency that the database can offer. For example, PostgreSQL’s SSI provides serializability but not linearizability, and Oracle provides neither. Just because a database is branded “ACID” doesn’t mean it meets the CAP theorem’s definition of consistency. Is this quote really correct?
Seriablizable but not linearizable
How does two phase commit recover from a participant's failure?
How ACID is the two-phase commit protocol?
Notes on distributed systems for youngbloods some additional commentary
Testing Distributed Systems for Linearizability
Scalability Cheatsheet: The Road to Paxos
The Limits of the CAP Theorem HN
The only time that a CAP-Available system would be available when a CAP-Consistent one would not is when one of the datacenters can’t talk to the other replicas, but can talk to clients, and the load balancer keeps sending it traffic.
the 'A' in CAP is boring. It does not mean what you think it means. Lynch et al. probably chose the definition because it's one for which the 'theorem' is both true and easy to prove. This is not the impossibility result with which designers of distributed systems should be most concerned.
The CAP Theorem - Foundation DB
Clarifications On The CAP Theorem And Data-Related Errors
Paxos actually solves a more genreal problem than 3PC.
Consensus on Transaction Commit
Delivering Billions of Messages Exactly Once HN
I don't want to ever see the phrase "Exactly Once" without several asterisks behind it. It might be exactly once from an "overall" point of view, but the client effectively needs infinitely durable infinite memory to perform the "distributed transaction" of acting on the message and responding to the server.
how do you cut a monolith in half? HN
Exactly-once Semantics are Possible: Here’s How Kafka Does it reddit nope
Providing API for building applications that have transactions and help with idempotent producing is really exciting. This will help lower a lot of pains associated with building stream processing systems. Doing it with very little performance overhead is amazing.
Indeed. Idempotent operations is the mainstay of managing "at least once" message delivery. If you had a true "exactly once" system, idempotency would be unnecessary.
Manage Kubernetes Clusters on AWS Using Kops
The problem with renting boxes is the hidden costs if you want to do it right. First of all, if you have anything mission critical, you need to run it in a high availability config, this is easy for stateless microservices, but when it comes to running your DB, you start renting three boxes instead of one or two and configuring them accordingly. And then you setup your Backup Infrastructure for disaster recovery, Glacier needs a replacement after all. No problem, just more disks(?) on a few more boxes(?) and bacula(?), better in a different Datacenter just to be on the safe side, it would be nasty if you whole rack gets fried and your data with it. Don't forget to backup your configuration, all of it. Loadbalancers, Server Environment Variables, Network (do you have an internal DNS?), Crontabs, some businesses need their audit logs stored etc... On the infrastructure level there is lots and lots of stuff you can do and you won't ever really need AWS, you'll just spend significantly more time finding and administering the right solutions than just using the AWS Solutions where you'll find a treasure trove of great tutorials and can relatively cheaply pay for support. If you then pay someone on top for 24/7 management/monitoring of your dedicated stack so that your team doesn't have to get up at 3 am because one of your VMs disk fills because some stray logfile is filling the disc, many of the savings you had by setting it up on a dedicated server go out of the window because the management partner needs to train their people to look into your infrastructure. AWS only Management Partners are just light-years cheaper because they can streamline their processes much better. You could also hire your own team of admins... Sure AWS is a beast with its own surprises, but overall the cost/benefit ratio is still very fair even if you factor in all the "surprises"(many of which your management partner will probably know about). Having layered support is really something beneficial aswell. If something is wonky with RDS, you get to call your management partner if he didn't detect it before you, who if he can't tackle it himself can call AWS technicians. This gets you much much further than you would get elsewhere. The outside the world is paying for (for example) perconas consultants or someone similar if the problems grow over their team's head. Sure, at some point in a companies growth, depending on how technical the operation is, there might be a time where an admin team and colocation/dedicated boxes make sense, where AWS technicians will scratch their heads etc., especially if you have some very very specific tasks you need to do. But for most people this is far off if ever.
Serializability, linearizability, and locality
Recall that strict serializability is essentially serializability plus linearizability’s real-time constraint: transactions cannot be arbitrarily re-ordered, but must appear to take place atomically at some time between their invocation and completion. When we add real-time constraints to sequential consistency, we get linearizability: a local property. Why can’t we add real-time constraints to serializability and obtain locality? Why don’t real-time multi-object transactions compose?
We can view strict serializability as linearizability plus the multi-object transactions of serializability. But in another sense, linearizability is strict serializability with the constraint that transactions are constrained to act on a single object, because that restriction provides locality.
Linearizability: A Correctness Condition for Concurrent Objects (1990)
Sequential consistency is equivalent to linearizability without condition L2. Serializability is analogous to sequential consistency with transactions, and strict serializability is analogous to linearizability with transactions. Sequential consistency, serializability, and strict serializability do not have the same locality and non-blocking properties as linearizability. Moreover, serializability and linearizability are for different domains. Serializability works well for databases because application developers should be able to easily express complex transactions. Linearizability is better for infrastructure in which the developer is willing to spend considerable effort to maximize concurrency.
Consensus Protocols: Two-Phase Commit
Consensus Protocols: Three-Phase Commit
3PC works very well when nodes may crash and come to a halt – leaving the protocol permanently when they encounter a fault. This is called the fail-stop fault model, and certainly describes a number of failures that we see every day. However, especially in networked systems, this isn’t the only way in which nodes crash. They may instead, upon encountering a fault, crash and then recover from the fault, to begin executing happily from the point that they left off (remember that, with stable storage that persists between crashes, there’s no reason that a restarted node couldn’t simply pick up the protocol from where it crashed). This is the fail-recover fault model, which is more general than fail-stop, and therefore a bit more tricky to deal with.
Similarly, heavy network load can delay the delivery of a message for an arbitrary period. In a synchronous network model, there is a bound on the amount of time a remote host can take to process and respond to a message. In an asynchronous model, no such bound exists. The key problem that asynchronicity causes is that time outs can no longer be reliably used as a proxy for failure detection
There are other ways that Paxos can go wrong. Acceptors need to keep a record of the highest proposal they have agreed to, and the value of any proposals they have accepted, in stable storage. If that storage should crash then the acceptor cannot take part in the protocol for fear of corrupting the execution (by effectively lying about the proposals it has already seen). This is a kind of Byzantine failure, where an acceptor deviates from the protocol.
Rather than sacrifice correctness, which some variants (including the one I described) of 3PC do, Paxos sacrifices liveness, i.e. guaranteed termination, when the network is behaving asynchronously and terminates only when synchronicity returns.
In distributed systems, what is a simple explanation of the Paxos algorithm?
3PC is fail-stop resilient, but not fail-recover resilient. Unfortunately real life requires fail-recover and hence we need a more general solution. This is where Paxos comes in.
What happens if we mandate only one Leader at a time in Paxos, and also mandate that instead of majority, all nodes must vote? You are right – we get 2PC. 2PC is a specific case of Paxos.
Byzantine faults, Paxos, and consensus
Paxos is in a sense sort of like RSA. RSA is expensive, so you use it as a wrapper to a cheaper protocol. With Paxos you elect a leader to be in charge of a large-scale event. The coordinator is a single point of failure, so you can use paxos to come up with a new coordinator. Most of the time you don’t run paxos, you only run it when something bad happens to prevent any one thing to be a single point of failure.
Why is memory reclamation so important? HN
You Can’t Sacrifice Partition Tolerance
Fear and Loathing in lock-free programming
Google's Cloud Spanner: how does it stack up?
Spanner was originally built by Google to handle workloads like AdWords and Google Play, that were, according to Google, previously running on massive, manually sharded MySQL implementations
So Paxos is a better, more general version of 2PC, and unlike 3PC, is has been proved correct?
A Comparison of Advanced, Modern Cloud Databases
Concurrent ACID: Whether the database supports ACID (atomicity, consistency, isolation, and durability) guarantees across multiple operations. ACID is a powerful tool for system correctness, and until recently has been a long sought but illusive chimera for distributed databases. I use the term “concurrent ACID” because technically Cosmos guarantees ACID, but only within the confines of a single operation.
modern techniques can achieve CP while still keeping availability that’s incredibly good. Like five or more 9s of good. This result is so optimal that modern databases seem to be converging on it. Every database on the list above is CP with varying levels of A
Time-based consistency. Sophisticated distributed systems like Spanner and CockroachDB tend to need a little more time to coordinate and verify the accuracy of the results that will be returned from any given node, and this makes them less suitable for low latency operations.
Spanner, TrueTime and the CAP Theorem HN
Why you should pick strong consistency, whenever possible
distributed systems reading list
A History of Transaction Histories
Data Laced with History: Causal Trees & Operational CRDTs
Standing on Distributed Shoulders of Giants
Tweets about distributed transactions
Distributed Systems, Failures, and Consensus
CIS 505: Software Systems Lecture Note on Consensus
Principles of Distributed Computing (lecture collection)
Distributed Systems W4995-1 Fall 2014 Roxana Geambasu lecture 17
3PC trades safety for liveness, 2PC trades liveness for safety. Paxos is "largely live" and blocks only on exceptional circunstances
Replication theory and practice
-
Stumbling over Consensus Research: Misunderstandings and Issues
-
Implementing Trustworthy Services Using Replicated State Machines
-
Selected Results from the Latest Decade of Quorum Systems Research
Our concurrent past, our distributed future
Distributed Systems vs Compositionality—Roland Kuhn
Consistency without consensus in production systems https://www.youtube.com/watch?v=lsKaNDj4TrE Martin Kleppmann - Conflict Resolution for Eventual Consistency GOTO 2015 • Coordination-Free Computations GOTO 2014 • How the Bitcoin Protocol Actually Works
Distributed Systems Theory for Practical Engineers
Applied Distributed Research in Apache Cassandra
Four Distributed Systems Architectural Patterns by Tim Berglund
What Came First: The Ordering of Events in Systems
Four Distributed Systems Architectural Patterns by Tim Berglund
[17:00] If you are a service, like Slack, which is composed of a series of organizations which are not as big...