- Synchronous, with a change of partition tolerance
- State machine replication (e.g., blockchain consensus)
- Requires safety (nothing bad ever happens), and liveness (eventually something good happens)
- Thunderella
- If you don't know Thunderella, go look it up (check out the BPASE '18 talk)
- Wait, there's a flaw in Thunderella?!
- A confirmed transaction can be undone, even in a somewhat benign setting
- wtf? they wrote 74 pages of proofs, so how can a provably correct protocol be flawed?
- The proof is fine... the problem is the synchronous network model
- Thunderella gives two properties:
- With honest majority, you get consistency
- With 3/4 honest and an honest leader/accelerator, you get liveness & good performance
- Can also get liveness with an honest majority by...
- Falling back to a slow chain (blockchain-based) whenever the fast chain halts
- How does fallback happen?
- Committee on fast path posts periodic heartbeats to the slow chain
- Posts notarized (3/4 signed) hashes of fast path log
- Both serves as heartbeat & checkpoint
- If long number of blocks without heartbeat (
k
blocks), then all agree to fallback to slow chain- However, not everyone agrees on the number of transactions before the fast chain failed!
- Everyone post notarized but UNCHECKPOINTED transactions onto the slowchain to achieve consensus on where the fastchain halted
- Basically, everyone gets together and shares their notes on what was confirmed on the fast chain
- Now you can achieve both consistency and liveness—you get optimistic liveness!
- Thunderella two sentences: single round of voting when things are good, slow blockchain for fallback when things are bad
- Committee on fast path posts periodic heartbeats to the slow chain
- So what's the flaw??
- Leader makes a proposal for a block
- Everyone signs, leader collects the signatures and produces a notarized version of the block
- Leader starts broadcasting the notarization to one committee member
- But then leader crashes and goes offline before broadcasting to anyone else!
- Committee member considers the block finalized and for example, pays out an end user
- But that committee member then ALSO goes offline
- So the only people who knew/saw the notarized block are now offline
- The rest of the committee members go back to the slow chain, but because no one has those transactions...
- They continue moving forward as though the transactions never happened
- Thus, the transactions are REVERTED despite having been earlier confirmed.
- This is bad! This should not happen!
- Why is such a flaw possible in a provably correct protocol?
- Classical synchronous models permit flawed protocols
- What is the synchronous model?
- We assume that when honest nodes send messages, they arrive in a bounded amount of time
- (And the protocol is aware of this bound)
- Message delay = 1 round
- In the synchronous model, having a temporary outage means you're faulty
- In a synchronous model, a "faulty" node gets no guarantees about correctness!
- In the partially synchronous model, the message delay can be arbitrarily long
- Having a temporary outage is isomorphic to having a long message delay
- We say that partially synchronous models are arbitrarily partition-tolerant
- But ANY partially synchronous model cannot tolerate more than 1/3 corruptions
- Can we achieve the best of both worlds?
- We want to tolerate 1/2 corruptions (which we can do in the synchronous model, a la Nakamoto consensus)
- We want to let temporary outages = long message delays
- The answer should be no, given classical consensus protocols...
- We assume that when honest nodes send messages, they arrive in a bounded amount of time
- What if we are more subtle in how we model how "synchronous" a network is?
- Imagine we have some nodes that are honest and online
- But other nodes that are honest, with unstable connections
- Then we have corrupt nodes who we don't care about
- We want to achieve consensus among the first two groups: is this possible?
- Turns out, you cannot do this unless the honest and online group is LARGE
- greater than or equal to 1/2 of all nodes
- This makes some synchrony assumptions about the network, since you assume at least half of the nodes have fast connectivity
- But a minority of nodes can be unstable
- "Best-possible partition tolerance"
- But it's commonly assumed that these honest and online nodes are persistent...
- But Bitcoin has been around for 10 years—those nodes churn over long enough timespans
- More realistically: every node is offline at some point!
- Turns out, you cannot do this unless the honest and online group is LARGE
- You just need at least 1/2 nodes at all times to be continuously online from round to round
- How to fix it? — Making Thunderella Best-Possible Partition Tolerant
- For Fast Path:
- Before, you confirmed a transaction whenever you saw a notarized block
- But you don't know that you aren't the first person to see this and then go offline
- So instead, you only confirm transactions in notarized blocks ONE BLOCK BACK
- This ensures that all notaries must have seen the previous block
- And thus should be able to write it to the slow chain if you go offline
- Before, you confirmed a transaction whenever you saw a notarized block
- For Fast Path:
- New theoretical and practical results around "best-possible partition tolerance"
- Can be considered a practical network model
- Guarantees liveness only during "periods of synchrony"
- Some philosophical observations:
- This class of protocols with best-possible partition tolerance is a subset of "Synchronous honest majority protocols"
- None of the classical synchronous honest majority protocols are best-possible partition tolerant
- Classical corrupt majority protocols cannot be best-partition tolerant (e.g., Dolev Strong)
- Classical synchronous model is a mismatch for what we need in real-world deployments
- Can be considered a practical network model
- Thunder is launching in Q1 2019
- Whoo
Created
February 3, 2019 06:58
-
-
Save Haseeb-Qureshi/68df0a41af49376bed884cb3a7035832 to your computer and use it in GitHub Desktop.
New and Simple Consensus Algorithms for ThunderCore’s MainNet (SBC19)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment