LGAIDec 5, 2023

Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study

arXiv:2312.03762v11 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This addresses the problem of goal misgeneralization in RL agents for researchers, showing incremental insights into how arbitrary preferences emerge from training procedures.

The study investigated color versus shape goal misgeneralization in reinforcement learning agents, finding that agents arbitrarily learn to detect goals through color channels rather than shape, with preferences changing due to underspecification when retraining with different random seeds.

We explore colour versus shape goal misgeneralization originally demonstrated by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an ambiguous choice, the agents seem to prefer generalization based on colour rather than shape. After training over 1,000 agents in a simplified version of the environment and evaluating them on over 10 million episodes, we conclude that the behaviour can be attributed to the agents learning to detect the goal object through a specific colour channel. This choice is arbitrary. Additionally, we show how, due to underspecification, the preferences can change when retraining the agents using exactly the same procedure except for using a different random seed for the training run. Finally, we demonstrate the existence of outliers in out-of-distribution behaviour based on training random seed alone.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes