and1truong/2022 - SRE conferences.md

Created January 15, 2025 19:16

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/and1truong/fe9f5ad7d01c2c55c4d08a956909980a.js"></script>
Save and1truong/fe9f5ad7d01c2c55c4d08a956909980a to your computer and use it in GitHub Desktop.

Download ZIP

Raw

2022 - SRE conferences.md

2022 - SRE conferences

SREcon22 Americas

The 'Success' in SRE Is Silent
Building and Running a Diversity-focused Pre-internship Program for SRE
A Postmortem of SRE Interviewing
Self-Destructing Feature Flags
Tales from the VOID: The Scary Truth about Incident Metrics
How We Survived (and Thrived) During The Pandemic and Helped Millions...
The Pandemic and The Classroom—Enabling Education for Millions
Applied Science Fiction: Operating a Research-Led Product
Taking the 737 to the Max
Securing Your Software Delivery Chain with Process Auditing
The Future of above-the-line Tooling
Tracing Bare Metal with OpenTelemetry
Are We There Yet? Metrics-Driven Prioritization for Your Reliability Roadmap
SRE stands for...Skydiving Resilience Engineer
Building a Path to the Future: Mentoring New SREs
eBPF: The Next Power Tool of SREs
How the Metrics Backend Works at Datadog
Automated Operating System and Environment Certification at LinkedIn...
Triaging Real-time Security Threats with eBPF-powered Observability
Exemplars in Practice: Finding the Needle in Your Observability Haystack
Dark Sky Camping: Reducing Alert Pollution with Modern Observability Practices
Ten-year Journey to 10,000 Production Machines
Beyond Distributed Tracing
History-based Latency Prober Tuning
Using Serverless Functions for Real-time Observability
Improving How We Observe Our Observability Data: Techniques for SREs
Principled Performance Analytics
Modeling Alert Quality
Emergent Organizational Failure: Five Disconnections
DO, RE, Me: Measuring the Effectiveness of Site Reliability Engineering
The Scientific Method for Resilience
A Fresh Look at Operational Debt

SREcon22 Europe/Middle East/Africa

Knowledge and Power: A Sociotechnical Systems Discussion on...
SRE as She Is Spoke
Oncall: An Equal Opportunity Waste of Time
Financial Regulators Worldwide Are Getting the Legal Right...
Statistics for Engineers
Measuring Reliability: What Got Us Here Won't Get Us There
Crayon Drawing Is a Vital Engineering Skill
Building Dynamic Configuration into Terraform
Hunting for Risky Dependencies in the World of Microservices
How We Implemented High Throughput Logging at Spotify
Engineering for Sustainability
SLOs, SREs, and GHGs
The Biases Confronting SREs
Market Data: Applying SRE Techniques to Legacy Designs
Life after The Chocolate Factory
Is Our Team as Resilient as Our Systems?
What SRE Could Be: Systems Reliability Engineering
Diamonds with Flaws: Examining the Pressures, Realities, and...
How We Drained Every Backbone Router Simultaneously
Break Free of the Template: Incident Writeups They Want to Read
Making the Impossible Impossible: Improving Reliability by...
Deep Dive: Azure Resource Manager Outage
Commas Save Lives, or at Least LinkedIn
Passing the Torch - Building a New Grad Program to Mentor...
Going from 30 to 30 Million SLOs
Disaster Recovery Testing at Booking.com
Slack's DNSSEC Rollout: Third Time's the Outage
Meatbag Systems: How Our Reliability Culture & Practice...
Principled Identification of "Root Causes" Using Techniques...
A Case Study in Chaos Testing: Uncovering Kernel Scaling Issues
A Better Way to Manage Command Line Tools: What We Learned...
Honey, I Broke the Things: Debugging Gray Failures...
The Repeat Incident Fallacy: What Jurassic Park Can Teach Us...
SRE in Enterprise
Unified Theory of SRE
Dissecting the Humble LSM Tree and SSTable
Caching Entire Systems without Invalidation
An SRE Guide to Linux Kernel Upgrades
The Math of Scalability
Schema-First Application Telemetry
SRE Is Weird, Down the Stack
SRE and ML: Why It Matters
Emotional Disaster Recovery: Debugging the Self with...
Over Nine Billion Dollars of SRE Lessons - the James Webb...
Rock Fishing and Incident Analysis: Increasing Insight
How Can SRE Help Security Governance?...
Navigating in the Dark

SREcon22 Asia/Pacific

Computing Performance 2022: What's on the Horizon
Move Fast and Learn Things: Principles of Cognition, Teaming...
How to Not Destroy Your Production Kubernetes Clusters
The Math behind the Incident Aftermath: A Practical Guide to Measuring...
OpenTelemetry and Observability: What, Why, and Why Now?
Principles of Safety and Reliability Learned from US Navy Landing Signal...
Infra Eng to Staff SRE: A Tale of Developing Yourself in an Ever Evolving...
Lifecycle of a Sample in the Prometheus TSDB
Metrics Stream Processing Using Riemann
Lifecycle of Reusable Automations: Track, Maintain, Deprecate
Dashboards and Runbooks: Scrapbooking for Engineers
Observability Is Not Analytics!
Lessons Learned Building a Global Synthetic Monitoring System
Sustaining Everything, Everywhere, All at Once!
Introducing the Reliability Map – r9y.dev
Chaos Engineering at Scale
The Multi Layered Cake of Resilience
Capacity vs Efficiency: Building a Globally Scalable Cloud Database
Improving Observability, Reliability, and Security of Relational Database...
Real-Time Adaptive Controls for Resilient Distributed Systems
Improving Machine Learning Development Reliability
How Can We Make Data Integrity Easy?
Cognitive and Self-Adaptive System for Effective Distributed-Tracing...
Site Reliability Evangelism: Practice Start-up within an Established...
Deploying Humans at the Edge of SRE
Challenges, Best Practices, and Solutions for Monitoring and Alerting...
A Better Way to Manage Stateful Systems: Design for Observability and Robust
Reliability Reviews in the Wild: Using Data to Drive Production Health
Leveraging Continuous Production Profiling for Providing Insights into...
Applying SRE Principles to CI/CD
Gremlins Exposed: Shining a Light on Mischievous Systems
Burnout at Scale: What to Try When You Just Can't
Backend API Design for SREs
Online Database Reliability, Performance, and Consistency Engineering
Migrating Datastores
Our Experience Tracking and Driving SLO Adoption at Goldman Sachs
Operationalizing ML Training Infra at Meta Scale
Advanced Linux Kernel Networking Monitoring
Using the Internet as Your Load-Balancer
A Post Incident Review Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment