LG AI MLMay 25, 2020

Policy Entropy for Out-of-Distribution Classification

Andreas Sedlmeier, Robert Müller, Steffen Illium, Claudia Linnhoff-Popien

arXiv:2005.12069v17.216 citations

Originality Incremental advance

AI Analysis

It addresses safety risks for real-world RL deployments by enabling reliable detection of untrained situations, though it appears incremental as it builds on existing one-class classification approaches.

The paper tackles the problem of detecting out-of-distribution states in reinforcement learning to improve safety, proposing PEOC, a policy entropy-based classifier that shows competitive performance against state-of-the-art methods in procedural environments.

One critical prerequisite for the deployment of reinforcement learning systems in the real world is the ability to reliably detect situations on which the agent was not trained. Such situations could lead to potential safety risks when wrong predictions lead to the execution of harmful actions. In this work, we propose PEOC, a new policy entropy based out-of-distribution classifier that reliably detects unencountered states in deep reinforcement learning. It is based on using the entropy of an agent's policy as the classification score of a one-class classifier. We evaluate our approach using a procedural environment generator. Results show that PEOC is highly competitive against state-of-the-art one-class classification algorithms on the evaluated environments. Furthermore, we present a structured process for benchmarking out-of-distribution classification in reinforcement learning.

View on arXiv PDF

Similar