Skip to content

Instantly share code, notes, and snippets.

@danrl
Created January 26, 2020 20:50
Show Gist options
  • Save danrl/d804247a8427c937a5e67363804c92f2 to your computer and use it in GitHub Desktop.
Save danrl/d804247a8427c937a5e67363804c92f2 to your computer and use it in GitHub Desktop.
NEW: Acknowledge and confirm acceptance
Yes - I confirm that I will speak at SREcon19Asia
NEW: Public Talk Title
Implementing Distributed Consensus
NEW: Short description
May I introduce "Skinny", an education-focused, distributed lock service.
With the help of Skinny, we will...
...briefly look at the Paxos protocol
...see an example of a typical Paxos run
...design a simple distributed consensus protocol
...learn the tricky parts of implementing our simple distributed consensus protocol
...gradually move from theory-level to coding-level, solving small challenges (network, availability, fault-tolerance) along the way
This talk addresses engineers who had little exposure to the inner workings of distributed consensus, who want to learn about distributed consensus as they start building distributed systems, and who worked with ready-made distributed consensus solutions such as Zookeper and etcd but strive to understand the underlying theory as well.
Disclaimer: This work is not affiliated with any company (including Google) and purely educational!
NEW: Speaker Name(s) for Public Posting
Dan Lüdtke (now at Google)
NEW: Bio for each presenter
Dan is a Site Reliability Manager in Munich. He contributes to open source software projects, regularly helps to organize large hacker events, runs an autonomous system for fun, and dreams of space travel. Prior to Google, Dan served his country, worked as a security consultant, joined a start-up, and wrote a book about IPv6.
Dan earned a master's degree in Computer Aided Engineering from the Munich University of the German Federal Armed Forces.
NEW: Speaker(s)' Slack Handle(s)
danrl
NEW: Speaker(s)' Twitter handles
danrl_com
[JPEG] NEW: Speaker(s)'s Headshot/portrait_small.jpg (1759kB)
Author (blind until review)
Dan Luedtke (Google) <[email protected]>
Keyword-Hash Tags
#consensus
#distributedsystems
#sre
#paxos
Track Choice
Core Principles Track
Proposal Type
45 minute talk
Long description
Distributed consensus protocols help machines agree on values, even if the network or some machines fail. That sounds all nice in theory but turns out to be tricky in practice. Join me in implementing a (variation) of the Paxos protocol. Lean back and enjoy while we learn about Skinny, a feature-free, educational distributed lock service. I already made all the stupid mistakes so you don’t have to. And I am sharing them with you, so we all can have a laugh together and learn something.
I would like to show different example scenarios and states of a distributed consensus system. Some of them allow recovery, some don’t. Since we control the source code, we could also provoke constellations that are rare in real-life but helpful for understanding the limitations of the protocol. The talk includes introducing protocol requirements alongside actual code that implements those requirements. How do we deal with slow networks? What happens if an instance fails unexpectedly? Can we recover from state loss? And if so, how do we guarantee majorities? Together, we will gradually answer these questions and make the implementation faster and more reliable. Distributed consensus is something we often use but rarely implement, nevertheless, it is worth deepening our understanding of how it works.
But the learning doesn’t stop after the talk! The code is open source. It comes with ready-to-use Terraform modules and Ansible playbooks to setup your own lab in minutes. An almost complete test coverage helps you to experiment with the code and immediately judge the outcome of your own improvements and modifications.
Session Outline
Outline
Distributed Consensus Overview (2min)
Paxos (6min) [OPTiONAL]
Introducing Skinny (2min)
How Skinny reaches Consensus (5min)
How Skinny deals with Instance Failure (8min)
Skinny APIs (4min)
Implementation Challenge: Reaching out via Network (15min)
Implementation Challenge: Early Stopping (5min)
Implementation Challenge: Duelling Proposers (5min) [OPTIONAL]
Further Reading & Watching (2min)
Where to find the code and how to start your own lab (2min)
This proposal: ~46min I would invest some time to streamline the topic here and there to make it fit the time slot if the proposal is accepted.
Audience takeaways
During the talk we (the audience and me) will examine the Skinny distributed lock service. I designed Skinny specifically for educational purposes.
The audience will learn about the Paxos protocol
The audience will see animated examples of a typical Paxos runs (most people prefer visual explanations over written explanations)
The audience will learn how to implement a simple distributed consensus protocol as we gradually move from theory-level to coding-level, solving small challenges (network, availability, fault-tolerance) along the way.
The talk addresses engineers who had little exposure to the inner workings of distributed consensus, who want to learn about distributed consensus as they start building distributed systems, and who worked with ready-made distributed consensus solutions such as Zookeper and etcd but want to understand the theory as well.
Other notes for the program committee
Please note: This work is not affiliated with any company and purely educational. This is work that I have done before I joined Google.
A previous(!!) version of the talk is documented under the following link and ran for ~45min: Clicking this link will reveal my identity and un-blind the review!
How comfortable are you in speaking to large groups (>100 people)?
Fairly comfortable
Topics
Core Principles
Reliability and Resilience
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment