SEFeb 20, 2017

A Platform for Automating Chaos Experiments

arXiv:1702.05849v121 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of maintaining high availability in complex, service-based systems like Netflix, though it appears incremental as it builds on existing chaos engineering concepts.

The paper tackles the problem of ensuring system reliability in Netflix's large-scale video streaming service by developing the Chaos Automation Platform, which automates failure injection experiments in production to verify that failures in non-critical services do not cause system outages.

The Netflix video streaming system is composed of many interacting services. In such a large system, failures in individual services are not uncommon. This paper describes the Chaos Automation Platform, a system for running failure injection experiments on the production system to verify that failures in non-critical services do not result in system outages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes