AI LGFeb 2, 2021

Agent Incentives: A Causal Perspective

Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg

arXiv:2102.01685v226.567 citations

Originality Highly original

AI Analysis

This work provides a foundational framework for evaluating the safety and fairness of AI systems by offering new tools for understanding agent incentives.

This paper introduces a causal framework for analyzing agent incentives, establishing completeness for a known value of information criterion and proposing a new sound and complete graphical criterion for value of control. It also introduces and provides sound and complete graphical criteria for two new incentive concepts: response incentives and instrumental control incentives.

We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system.

View on arXiv PDF

Similar