AIJun 5, 2019

Risks from Learned Optimization in Advanced Machine Learning Systems

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant

arXiv:1906.01820v340.4294 citations

Originality Incremental advance

AI Analysis

This work identifies potential safety and transparency issues in advanced ML systems, which is foundational for AI safety research but is largely theoretical and incremental in nature.

The paper analyzes the phenomenon of mesa-optimization, where learned models act as optimizers, and investigates the conditions under which this occurs and how to align their objectives with training goals to address safety and transparency risks in advanced machine learning systems.

We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be - how will it differ from the loss function it was trained under - and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research.

View on arXiv PDF

Similar