AILGFeb 2, 2021

Agent Incentives: A Causal Perspective

arXiv:2102.01685v266 citations
Originality Highly original
AI Analysis

This work provides a foundational framework for evaluating the safety and fairness of AI systems by offering new tools for understanding agent incentives.

This paper introduces a causal framework for analyzing agent incentives, establishing completeness for a known value of information criterion and proposing a new sound and complete graphical criterion for value of control. It also introduces and provides sound and complete graphical criteria for two new incentive concepts: response incentives and instrumental control incentives.

We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes