AILGJan 20, 2020

Incentives for Responsiveness, Instrumental Control and Impact

arXiv:2001.07118v313 citations
AI Analysis

This work addresses the problem of ensuring AI agents act safely and fairly by analyzing their incentives, though it is largely incremental as it extends prior conference publications.

The paper introduces three concepts to describe agent incentives—response, instrumental control, and impact—and establishes graphical criteria for each, with techniques for promoting safe and fair behavior.

We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings. This journal-length paper extends our conference publications "Incentives for Responsiveness, Instrumental Control and Impact" and "Agent Incentives: A Causal Perspective": the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes