Conversations About Running Production Systems at Scale
Publication Date: 2018-06-25
Number of pages: 300
Organizations—big and small—have started to realize just how crucial system and application reliability is to their business. At the same time, they’ve also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge.
SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful O’Reilly book that described Google’s creation of the discipline and the implementation that has allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space.
The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now. Listen as engineers and other leaders in the field discuss different ways of implementing SRE and SRE principles in a wide variety of settings; how SRE relates to other approaches like DevOps; the specialities on the cutting edge that will soon be common place in SRE; best practices and technologies that make practicing SRE easier; and finally hear what people have to say about the important, but rarely discussed human side of SRE.
David N. Blank-Edelman is the book’s curator and editor.
Among the essays in this Early Release edition, you’ll find:How to Apply SRE Principles without Dedicated SRE Teams—Björn Rabenstein and Matthias Rampke, SoundCloud LtdThe Intersection of Reliability and Privacy—Betsy Beyer and Amber Yust, GoogleThe Art and Science of the SLO (Service Level Objectives)—Theo Schlossnagle, CirconusImmutable Infrastructure and SRE—Jonah Horowitz, StripeScriptable Load Balancers—Emil Stolarsky, ShopifyThe Service Mesh: Wrangler of Your Microservices?—Matt Klein, LyftPsychological Safety in SRE—John Looney, Intercom