Created
January 26, 2020 20:50
-
-
Save danrl/d804247a8427c937a5e67363804c92f2 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NEW: Acknowledge and confirm acceptance | |
Yes - I confirm that I will speak at SREcon19Asia | |
NEW: Public Talk Title | |
Implementing Distributed Consensus | |
NEW: Short description | |
May I introduce "Skinny", an education-focused, distributed lock service. | |
With the help of Skinny, we will... | |
...briefly look at the Paxos protocol | |
...see an example of a typical Paxos run | |
...design a simple distributed consensus protocol | |
...learn the tricky parts of implementing our simple distributed consensus protocol | |
...gradually move from theory-level to coding-level, solving small challenges (network, availability, fault-tolerance) along the way | |
This talk addresses engineers who had little exposure to the inner workings of distributed consensus, who want to learn about distributed consensus as they start building distributed systems, and who worked with ready-made distributed consensus solutions such as Zookeper and etcd but strive to understand the underlying theory as well. | |
Disclaimer: This work is not affiliated with any company (including Google) and purely educational! | |
NEW: Speaker Name(s) for Public Posting | |
Dan Lüdtke (now at Google) | |
NEW: Bio for each presenter | |
Dan is a Site Reliability Manager in Munich. He contributes to open source software projects, regularly helps to organize large hacker events, runs an autonomous system for fun, and dreams of space travel. Prior to Google, Dan served his country, worked as a security consultant, joined a start-up, and wrote a book about IPv6. | |
Dan earned a master's degree in Computer Aided Engineering from the Munich University of the German Federal Armed Forces. | |
NEW: Speaker(s)' Slack Handle(s) | |
danrl | |
NEW: Speaker(s)' Twitter handles | |
danrl_com | |
[JPEG] NEW: Speaker(s)'s Headshot/portrait_small.jpg (1759kB) | |
Author (blind until review) | |
Dan Luedtke (Google) <[email protected]> | |
Keyword-Hash Tags | |
#consensus | |
#distributedsystems | |
#sre | |
#paxos | |
Track Choice | |
Core Principles Track | |
Proposal Type | |
45 minute talk | |
Long description | |
Distributed consensus protocols help machines agree on values, even if the network or some machines fail. That sounds all nice in theory but turns out to be tricky in practice. Join me in implementing a (variation) of the Paxos protocol. Lean back and enjoy while we learn about Skinny, a feature-free, educational distributed lock service. I already made all the stupid mistakes so you don’t have to. And I am sharing them with you, so we all can have a laugh together and learn something. | |
I would like to show different example scenarios and states of a distributed consensus system. Some of them allow recovery, some don’t. Since we control the source code, we could also provoke constellations that are rare in real-life but helpful for understanding the limitations of the protocol. The talk includes introducing protocol requirements alongside actual code that implements those requirements. How do we deal with slow networks? What happens if an instance fails unexpectedly? Can we recover from state loss? And if so, how do we guarantee majorities? Together, we will gradually answer these questions and make the implementation faster and more reliable. Distributed consensus is something we often use but rarely implement, nevertheless, it is worth deepening our understanding of how it works. | |
But the learning doesn’t stop after the talk! The code is open source. It comes with ready-to-use Terraform modules and Ansible playbooks to setup your own lab in minutes. An almost complete test coverage helps you to experiment with the code and immediately judge the outcome of your own improvements and modifications. | |
Session Outline | |
Outline | |
Distributed Consensus Overview (2min) | |
Paxos (6min) [OPTiONAL] | |
Introducing Skinny (2min) | |
How Skinny reaches Consensus (5min) | |
How Skinny deals with Instance Failure (8min) | |
Skinny APIs (4min) | |
Implementation Challenge: Reaching out via Network (15min) | |
Implementation Challenge: Early Stopping (5min) | |
Implementation Challenge: Duelling Proposers (5min) [OPTIONAL] | |
Further Reading & Watching (2min) | |
Where to find the code and how to start your own lab (2min) | |
This proposal: ~46min I would invest some time to streamline the topic here and there to make it fit the time slot if the proposal is accepted. | |
Audience takeaways | |
During the talk we (the audience and me) will examine the Skinny distributed lock service. I designed Skinny specifically for educational purposes. | |
The audience will learn about the Paxos protocol | |
The audience will see animated examples of a typical Paxos runs (most people prefer visual explanations over written explanations) | |
The audience will learn how to implement a simple distributed consensus protocol as we gradually move from theory-level to coding-level, solving small challenges (network, availability, fault-tolerance) along the way. | |
The talk addresses engineers who had little exposure to the inner workings of distributed consensus, who want to learn about distributed consensus as they start building distributed systems, and who worked with ready-made distributed consensus solutions such as Zookeper and etcd but want to understand the theory as well. | |
Other notes for the program committee | |
Please note: This work is not affiliated with any company and purely educational. This is work that I have done before I joined Google. | |
A previous(!!) version of the talk is documented under the following link and ran for ~45min: Clicking this link will reveal my identity and un-blind the review! | |
How comfortable are you in speaking to large groups (>100 people)? | |
Fairly comfortable | |
Topics | |
Core Principles | |
Reliability and Resilience |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment