Lorin Hochstein

SE
3papers
308citations
Novelty25%
AI Score19

3 Papers

SEMay 12, 2019
Automating chaos experiments in production

Ali Basiri, Lorin Hochstein, Nora Jones et al.

Distributed systems often face transient errors and localized component degradation and failure. Verifying that the overall system remains healthy in the face of such failures is challenging. At Netflix, we have built a platform for automatically generating and executing chaos experiments, which check how well the production system can handle component failures and slowdowns. This paper describes the platform and our experiences operating it.

SEFeb 20, 2017
A Platform for Automating Chaos Experiments

Ali Basiri, Aaron Blohowiak, Lorin Hochstein et al.

The Netflix video streaming system is composed of many interacting services. In such a large system, failures in individual services are not uncommon. This paper describes the Chaos Automation Platform, a system for running failure injection experiments on the production system to verify that failures in non-critical services do not result in system outages.

SEFeb 20, 2017
Chaos Engineering

Ali Basiri, Niosha Behnam, Ruud de Rooij et al.

Modern software-based services are implemented as distributed systems with complex behavior and failure modes. Many large tech organizations are using experimentation to verify the reliability of such systems. We use the term "Chaos Engineering" to refer to this approach, and discuss the underlying principles and how to use it to run experiments.