AINov 25, 2021

Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)

Peter Vamplew, Benjamin J. Smith, Johan Kallstrom, Gabriel Ramos, Roxana Radulescu, Diederik M. Roijers, Conor F. Hayes, Fredrik Heintz, Patrick Mannion, Pieter J. K. Libin, Richard Dazeley, Cameron Foale

arXiv:2112.15422v126.586 citations

Originality Incremental advance

AI Analysis

This is a foundational critique for AI researchers and theorists, addressing the theoretical underpinnings of reward maximization in intelligence.

The paper challenges the assumption that scalar rewards are sufficient for intelligence, arguing that multi-objective models are necessary to account for biological and computational aspects, and warns against using scalar rewards for artificial general intelligence due to safety and ethical risks.

The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, it is still undesirable to use this approach for the development of artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour.

View on arXiv PDF

Similar