Terminology for reasoning about consistency and availability trade-offs in distributed data systems.
Proposed by Martin Kleppmann in A Critique of the CAP Theorem.
Availability is an empirical metric, not a property of an algorithm. It is defined as the percentage of successful requests (returning a non-error response within a predefined latency bound) over some pe- riod of system operation.
Delay-sensitive describes algorithms or operations that need to wait for network communication to complete, i.e. which have latency proportional to network delay. The opposite is delay-independent. Systems must specify the nature of the sen- sitivity (e.g. an operation may be sensitive to intra-datacenter delay but independent of inter- datacenter delay). A fully delay-independent sys- tem supports disconnected (offline) operation.
Network faults encompass packet loss (both transient and long-lasting) and unusually large packet delay. Network partitions are just one particular type of network fault; in most cases, systems should plan for all kinds of network fault, and not only parti- tions. As long as lost packets or failed requests are retried, they can be modeled as large network de- lay.
Fault tolerance is used in preference to high availabil- ity or partition tolerance. The maximum fault that can be tolerated must be specified (e.g. “the al- gorithm can tolerate up to a minority of replicas crashing or disconnecting”), and the description must also state what happens if more faults occur than the system can tolerate (e.g. all requests return an error, or a consistency property is violated).
Consistency refers to a spectrum of different con- sistency models (including linearizability and causal consistency), not one particular consistency model. When a particular consistency model such as linearizability is intended, it is referred to by its usual name. The term strong consistency is vague, and may refer to linearizability, sequential consis- tency or one-copy serializability.