LGAIOct 29, 2021

GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL

arXiv:2110.15489v13 citations
Originality Incremental advance
AI Analysis

This addresses the need for safe and robust generalization in RL by enabling agents to detect distribution shifts through active experimentation, which is incremental as it builds on existing OOD detection concepts from supervised learning.

The paper tackles the problem of out-of-distribution detection in reinforcement learning by defining a causal framework and proposing a novel task called Out-of-Task Distribution detection, where an RL agent actively experiments in test environments to determine if they are out-of-task distribution, and finds that their method GalilAI significantly outperforms a baseline.

Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process - RL agents actively query their environment for data, and the data are a function of the policy followed by the agent. An agent could thus neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task: that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent that actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that GalilAI outperforms the baseline significantly. See visualizations of our method https://galil-ai.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes