We are the Google Site Reliability team. We make Google’s websites work. Ask us Anything!
We are the Google Site Reliability (SRE) team. We’re responsible for the 24x7 operation of Google.com, as well as the technical infrastructure behind many other Google products such as GMail, Maps, G+ and other stuff you know and love. We’ve been traditionally invisible and behind-the-scenes but we thought we’d drop on here and answer any questions about what we do, what stuff we come up against, and what it’s like to be an SRE.
Other interesting things to give you an idea of what we do:
Blog post about the Leap Second written by Chris Pascoe from SRE give an ides of the kind of hairy problems we come up against.
Steven Levy wrote a Wired Article about inside our datacenters, and managed to make us sound like some sort of amazing justice team.
Kripa (who’s one of our participants today!) also writes about DiRT for ACM Queue.
We’ll be here from 12pm to 2pm PST to answer your questions, when we'll have info on our participants.
Proof (official Google accounts) :
EDIT 11:50PST: We're just getting set up here to answer your questions. We are:
Kripa Krishnan (/u/kripakrishnan), SRE Technical Program Manager and DiRT mastermind from our Mountain View HQ. Kripa works on infrastructure efforts in Google Apps.
Cody Smith (/u/clusteroops), long-time senior SRE from Mountain View. Cody works on Search and Infrastructure.
Dave O’Connor (/u/sre_pointyhair), Site Reliability Manager from our Dublin, Ireland office. Dave manages the Storage SRE team in Dublin that runs Bigtable, Colossus, Spanner, and other storage tech our products are built on.
John Collins (/u/jrc-sre), SRE Ombudsman, advocate and general force for good, from Mountain View.
EDIT 13:56PST: OK folks, we're all done. Thanks for the questions, hope our answers were satisfactory. May the queries flow and the pagers be silent.
*EDIT Jan 30: Corrected the spelling of @stevenlevy's name. Whoops-a-daisy. *