Casey Rosenthal

2papers

2 Papers

SEFeb 20, 2017
A Platform for Automating Chaos Experiments

Ali Basiri, Aaron Blohowiak, Lorin Hochstein et al.

The Netflix video streaming system is composed of many interacting services. In such a large system, failures in individual services are not uncommon. This paper describes the Chaos Automation Platform, a system for running failure injection experiments on the production system to verify that failures in non-critical services do not result in system outages.

SEFeb 20, 2017
Chaos Engineering

Ali Basiri, Niosha Behnam, Ruud de Rooij et al.

Modern software-based services are implemented as distributed systems with complex behavior and failure modes. Many large tech organizations are using experimentation to verify the reliability of such systems. We use the term "Chaos Engineering" to refer to this approach, and discuss the underlying principles and how to use it to run experiments.