February 11, 2020

MSR Cambridge Lab Lecture: Sankie: Using Data to Build Better Systems and Services

11:00-12:00

Location: MSR Cambridge

Today’s systems and services are large and complex, often supporting millions or even billions of entities. Such systems are extremely dynamic as developers continuously commit code and introduce new features, ﬁxes and, consequently, new bugs. Multiple problems crop up in such a dynamic environment, from misconfiguration of essential services, very slow testing and deployment procedures, and extended service disruptions when catastrophic bugs hit deployment. Over the last three years, we have been working on Project Sankie which uses code, test logs and telemetry as data to build several analyses to aid engineers. My talk will describe two of these analyses in detail. First, I will present Rex, a tool that uses commit logs from the last six months to find and flag misconfigurations at commit-time, thereby ensuring such misconfigurations do not enter deployment. Next, I will present Orca, a bug localization tool that causes a three-fold reduction in Office 365’s on-call engineer workload. Finally, I will briefly summarize the status of Project Sankie and some lessons I received through experience of what strategies worked, and what did not.