AIGTLGMar 22

The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

arXiv:2603.2099418.8h-index: 15
AI Analysis

This work addresses the safety-critical challenge of enabling AI assistants to safely disobey humans in shared control scenarios, providing a mathematical foundation for algorithmic development and empirical study, though it is incremental in formalizing an existing concept.

The paper tackles the problem of formalizing intelligent disobedience in shared autonomy, where an automated assistant must decide when to override human instructions to prevent harm, by introducing the Intelligent Disobedience Game (IDG) as a sequential game-theoretic framework based on Stackelberg games, and it identifies strategic phenomena like 'safety traps' where safety is maintained but goals are not achieved.

In shared autonomy, a critical tension arises when an automated assistant must choose between obeying a human's instruction and deliberately overriding it to prevent harm. This safety-critical behavior is known as intelligent disobedience. To formalize this dynamic, this paper introduces the Intelligent Disobedience Game (IDG), a sequential game-theoretic framework based on Stackelberg games that models the interaction between a human leader and an assistive follower operating under asymmetric information. It characterizes optimal strategies for both agents across multi-step scenarios, identifying strategic phenomena such as ``safety traps,'' where the system indefinitely avoids harm but fails to achieve the human's goal. The IDG provides a needed mathematical foundation that enables both the algorithmic development of agents that can learn safe non-compliance and the empirical study of how humans perceive and trust disobedient AI. The paper further translates the IDG into a shared control Multi-Agent Markov Decision Process representation, forming a compact computational testbed for training reinforcement learning agents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes