The papers included in CSE-586 reading.
- Paxos Made Moderately Complex. Robbert Van Renesse and Deniz Altinbuken, ACM Computing Surveys, 2015.
- Paxos made live - An engineering perspective. Tushar D. Chandra, Robert Griesemer, Joshua Redstone. ACM PODC, Pages: 398 - 407, 2007
- ZooKeeper: Wait-free coordination for internet-scale systems P. Hunt, M. Konar, F. P. Junqueira, and B. Reed USENIX ATC 2010.
- The Chubby Lock Service for Loosely-Coupled Distributed Systems. Mike Burrows, OSDI 2006.
- In Search of an Understandable Consensus Algorithm. Diego Ongaro, John Ousterhout, USENIX ATC, 2014.
- WPaxos: Wide Area Network Flexible Consensus. Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Tevfik Kosar, IEEE TPDS, 2019.
- Dissecting the Performance of Strongly-Consistent Replication Protocols. Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Sigmod 2019.
- Viewstamped replication revisited. Barbara Liskov and James Cowling. MIT-CSAIL-TR-2012-021, 2012.
- Chain Replication for Supporting High Throughput and Availability. Robbert van Renesse and Fred B. Schneider, OSDI 2004.
- FAWN: A Fast Array of Wimpy Nodes David G. Andersen and Jason Franklin and Michael Kaminsky and Amar Phanishayee and Lawrence Tan and Vijay Vasudevan. SOSP 2009.
- CORFU: A shared log design for Flash clusters. Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, John D. Davis, NSDI'2012.
- Unreliable Failure Detectors for Reliable Distributed Systems, Tushar Deepak Chandra and Sam Toueg, Journal of the ACM, 1996.
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems. Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Renna Rodrigues, Xu Zhao, Yongle Zhang, Pranay U. Jain, and Michael Stumm, OSDI 2014.
- Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages, Haryadi S. Gunawi, Mingzhe Hao, and Riza O. Sumintom Agung Laksono, Anang D. Satria, Jeffry Adityatama, and Kurnia J. Eliazar, SOCC 2016.
- Does The Cloud Need Stabilizing? Murat Demirbas, Aleksey Charapko, Ailidani Ailijiang, 2018.
- TaxDC: A Taxonomy of nondeterministic concurrency bugs in datacenter distributed systems, Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, Haryadi S. Gunawi, ASPLOS 2016.
- Time, Clocks, and the Ordering of Events in a Distributed System. Leslie Lamport, Commn. of the ACM, 1978.
- Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases. Sandeep Kulkarni, Murat Demirbas, Deepak Madeppa, Bharadwaj Avva, and Marcelo Leone, 2014. https://cse.buffalo.edu/tech-reports/2014-04.pdf
- Distributed Snapshots: Determining Global States of a Distributed System. K. Mani Chandy Leslie Lamport, ACM Transactions on Computer Systems, 1985.
- Tail at scale. Jeff Dean, Luiz Andre Barroso, Commn of the ACM, 2013.
- Lessons from Giant-Scale Services. Eric A. Brewer, IEEE Internet Computing, 2001.
- Above the Clouds: A Berkeley View of Cloud Computing. Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica and Matei Zaharia. EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2009-28 February 10, 2009.
- Serverless computing: One step forward, two steps back, UC Berkeley, CIDR 2019.
- Cloud Programming Simplified: A Berkeley View on Serverless Computing, 2019. https://arxiv.org/abs/1902.03383
- On designing and deploying Internet scale services, James Hamilton, LISA 2007.
- Life beyond Distributed Transactions: an Apostate’s Opinion. Pat Helland, CIDR 2007.
- Optimistic Replication. Yasushi Saito and Marc Shapiro, ACM Computing Surveys, 2005.
- CAP Twelve Years Later: How the "Rules" Have Changed. Eric Brewer, IEEE Computer, 2012
- PNUTS: Yahoo!'s Hosted Data Serving Platform. Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni, VLDB 2008.
- Dynamo: Amazon’s Highly Available Key-Value Store. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels, ACM SIGOPS 2007.
- Bigtable: A Distributed Storage System for Structured Data. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, ACM Transactions on Computer Systems, 2008.
- Spanner: Google’s Globally-Distributed Database James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman,Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura,David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,Christopher Taylor, Ruth Wang, Dale Woodford, ACM Trans on Computer Systems, 2013.
- MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat, Commn of the ACM, 2008.
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. NSDI 2012. April 2012.
- TUX2: Distributed Graph Computation for Machine Learning, Wencong Xiao, Jilong Xue, Youshan Miao, Zhen Li, Cheng Chen and Ming Wu, Wei Li, Lidong Zhou, NSDI 2017.
- Proteus: agile ML elasticity through tiered reliability in dynamic resource markets. Aaron Harlap, Alexey Tumanov, Andrew Chung, Gregory R. Ganger, Phillip B. Gibbons. EuroSys, 2017.
- TensorFlow: A system for large-scale machine learning, Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. OSDI 2016.
- Practical Byzantine Fault Tolerance, Miguel Castro and Barbara Liskov, OSDI'99.
- Bitcoin: A Peer-to-Peer Electronic Cash System, Satoshi Nakamoto, 2008.
- Scalable and Probabilistic Leaderless BFT Consensus through Metastability, Team Rocket, Maofan Yin, Kevin Sekniqi, Robbert van Renesse, and Emin Gun Sirer, 2019.
- Untangling Blockchain: A Data Processing View of Blockchain Systems. Tien Tuan Anh Dinh, Rui Liu, Meihui Zhang, Gang Chen, Beng Chin Ooi, IEEE Transactions on Knowledge and Data Engineering, 2017.
- Bridging Paxos and Blockchain Consensus. A. Charapko, A. Ailijiang, M. Demirbas, IEEE Blockchain, 2018. http://www.cse.buffalo.edu/~demirbas/publications/bridging.pdf
- Blockchains from a distributed computing perspective, Maurice Herlihy, Commn of the ACM, 2017.