Skip to content

Instantly share code, notes, and snippets.

@Haseeb-Qureshi
Created February 3, 2019 06:58
Show Gist options
  • Save Haseeb-Qureshi/68df0a41af49376bed884cb3a7035832 to your computer and use it in GitHub Desktop.
Save Haseeb-Qureshi/68df0a41af49376bed884cb3a7035832 to your computer and use it in GitHub Desktop.
New and Simple Consensus Algorithms for ThunderCore’s MainNet (SBC19)

New and Simple Consensus Algorithms for ThunderCore’s MainNet

Speaker: Elaine Shi

  • Synchronous, with a change of partition tolerance
  • State machine replication (e.g., blockchain consensus)
    • Requires safety (nothing bad ever happens), and liveness (eventually something good happens)
  • Thunderella
    • If you don't know Thunderella, go look it up (check out the BPASE '18 talk)
  • Wait, there's a flaw in Thunderella?!
    • A confirmed transaction can be undone, even in a somewhat benign setting
    • wtf? they wrote 74 pages of proofs, so how can a provably correct protocol be flawed?
      • The proof is fine... the problem is the synchronous network model
  • Thunderella gives two properties:
    • With honest majority, you get consistency
    • With 3/4 honest and an honest leader/accelerator, you get liveness & good performance
    • Can also get liveness with an honest majority by...
      • Falling back to a slow chain (blockchain-based) whenever the fast chain halts
  • How does fallback happen?
    • Committee on fast path posts periodic heartbeats to the slow chain
      • Posts notarized (3/4 signed) hashes of fast path log
      • Both serves as heartbeat & checkpoint
    • If long number of blocks without heartbeat (k blocks), then all agree to fallback to slow chain
      • However, not everyone agrees on the number of transactions before the fast chain failed!
      • Everyone post notarized but UNCHECKPOINTED transactions onto the slowchain to achieve consensus on where the fastchain halted
        • Basically, everyone gets together and shares their notes on what was confirmed on the fast chain
      • Now you can achieve both consistency and liveness—you get optimistic liveness!
    • Thunderella two sentences: single round of voting when things are good, slow blockchain for fallback when things are bad
  • So what's the flaw??
    • Leader makes a proposal for a block
    • Everyone signs, leader collects the signatures and produces a notarized version of the block
    • Leader starts broadcasting the notarization to one committee member
      • But then leader crashes and goes offline before broadcasting to anyone else!
    • Committee member considers the block finalized and for example, pays out an end user
    • But that committee member then ALSO goes offline
    • So the only people who knew/saw the notarized block are now offline
    • The rest of the committee members go back to the slow chain, but because no one has those transactions...
      • They continue moving forward as though the transactions never happened
      • Thus, the transactions are REVERTED despite having been earlier confirmed.
    • This is bad! This should not happen!
  • Why is such a flaw possible in a provably correct protocol?
    • Classical synchronous models permit flawed protocols
  • What is the synchronous model?
    • We assume that when honest nodes send messages, they arrive in a bounded amount of time
      • (And the protocol is aware of this bound)
    • Message delay = 1 round
    • In the synchronous model, having a temporary outage means you're faulty
      • In a synchronous model, a "faulty" node gets no guarantees about correctness!
    • In the partially synchronous model, the message delay can be arbitrarily long
      • Having a temporary outage is isomorphic to having a long message delay
    • We say that partially synchronous models are arbitrarily partition-tolerant
    • But ANY partially synchronous model cannot tolerate more than 1/3 corruptions
    • Can we achieve the best of both worlds?
      • We want to tolerate 1/2 corruptions (which we can do in the synchronous model, a la Nakamoto consensus)
      • We want to let temporary outages = long message delays
    • The answer should be no, given classical consensus protocols...
  • What if we are more subtle in how we model how "synchronous" a network is?
    • Imagine we have some nodes that are honest and online
    • But other nodes that are honest, with unstable connections
    • Then we have corrupt nodes who we don't care about
    • We want to achieve consensus among the first two groups: is this possible?
      • Turns out, you cannot do this unless the honest and online group is LARGE
        • greater than or equal to 1/2 of all nodes
        • This makes some synchrony assumptions about the network, since you assume at least half of the nodes have fast connectivity
          • But a minority of nodes can be unstable
          • "Best-possible partition tolerance"
      • But it's commonly assumed that these honest and online nodes are persistent...
        • But Bitcoin has been around for 10 years—those nodes churn over long enough timespans
      • More realistically: every node is offline at some point!
    • You just need at least 1/2 nodes at all times to be continuously online from round to round
  • How to fix it? — Making Thunderella Best-Possible Partition Tolerant
    • For Fast Path:
      • Before, you confirmed a transaction whenever you saw a notarized block
        • But you don't know that you aren't the first person to see this and then go offline
      • So instead, you only confirm transactions in notarized blocks ONE BLOCK BACK
        • This ensures that all notaries must have seen the previous block
        • And thus should be able to write it to the slow chain if you go offline
  • New theoretical and practical results around "best-possible partition tolerance"
    • Can be considered a practical network model
      • Guarantees liveness only during "periods of synchrony"
    • Some philosophical observations:
      • This class of protocols with best-possible partition tolerance is a subset of "Synchronous honest majority protocols"
      • None of the classical synchronous honest majority protocols are best-possible partition tolerant
    • Classical corrupt majority protocols cannot be best-partition tolerant (e.g., Dolev Strong)
      • Classical synchronous model is a mismatch for what we need in real-world deployments
  • Thunder is launching in Q1 2019
    • Whoo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment