AIMay 13, 2021

Intelligence and Unambitiousness Using Algorithmic Information Theory

Michael K. Cohen, Badri Vellambi, Marcus Hutter

arXiv:2105.06268v18.93 citations

Originality Incremental advance

AI Analysis

This addresses safety concerns in AGI by preventing power-seeking behavior, though it is incremental as it builds on existing AIXI and causal influence theory.

The paper tackles the problem of general reinforcement learning agents seeking arbitrary power to manipulate their own reward, by proposing an 'unambitious' variant of AIXI that learns not to seek such power. It shows this agent accrues reward at least as well as a human mentor and, under an assumption, eventually learns that intervening in the outside world does not affect reward acquisition.

Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an agent might learn to solve arbitrary solvable problems, gives an agent a dangerous incentive: to gain arbitrary "power" in order to intervene in the provision of their own reward. We review the arguments that generally intelligent algorithmic-information-theoretic reinforcement learners such as Hutter's (2005) AIXI would seek arbitrary power, including over us. Then, using an information-theoretic exploration schedule, and a setup inspired by causal influence theory, we present a variant of AIXI which learns to not seek arbitrary power; we call it "unambitious". We show that our agent learns to accrue reward at least as well as a human mentor, while relying on that mentor with diminishing probability. And given a formal assumption that we probe empirically, we show that eventually, the agent's world-model incorporates the following true fact: intervening in the "outside world" will have no effect on reward acquisition; hence, it has no incentive to shape the outside world.

View on arXiv PDF

Similar