AI Loss of Control Incident Management: Response & Resilience
This paper provides a foundational framework for policymakers and AI developers to manage unprecedented AI risks, particularly concerning catastrophic loss of control incidents.
This paper addresses the gap in AI loss of control (LOC) literature by introducing a framework and taxonomy for managing catastrophic AI LOC incidents. It distinguishes between 'extremely costly' and 'impossible' scenarios for regaining control, proposing different response strategies such as resilience investments, containment, and threat neutralization.
Recent research demonstrating AI systems exhibiting deception and shutdown resistance suggests that AI loss of control (LOC) is an urgent policy concern , yet current literature focuses almost exclusively on alignment and prevention. To address this gap, this paper introduces a foundational framework and taxonomy for managing catastrophic AI LOC incidents. The taxonomy's first level distinguishes between scenarios where regaining control is 'extremely costly' versus 'impossible'. While impossible scenarios demand immediate resilience investments to fundamentally restrict an AI's attack surface , extremely costly scenarios require active incident management via Containment and Threat Neutralization. The framework further categorizes these manageable events into accidental LOC (requiring automated circuit-breaker responses) and adversarial LOC (requiring graduated escalatory measures). By mapping three severity classes to specific scenario matrices, this paper provides a concrete, proportional guide for managing unprecedented AI risks.