AILGJun 21, 2016

Concrete Problems in AI Safety

arXiv:1606.06565v23146 citations
AI Analysis

This work identifies concrete safety problems for developers and researchers in AI, but it is incremental as it builds on existing concerns without introducing new methods.

The paper addresses the problem of accidents in machine learning systems, defined as unintended harmful behavior from poor AI design, by presenting five practical research problems categorized by issues with objective functions, supervision, and learning processes, and suggests research directions for cutting-edge AI safety.

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes