LGAICVROOct 31, 2022

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

MILA
arXiv:2211.00164v214 citationsh-index: 72
Originality Incremental advance
AI Analysis

This work addresses a key problem for real-world RL applications, such as robotics, by improving robustness to irrelevant visual information, though it is incremental in building on existing theoretical concepts.

The paper tackles the challenge of offline reinforcement learning in pixel-based visual observation spaces with complex exogenous noise by proposing Agent-Controller Representations (ACRO), which uses multi-step inverse models to learn control-relevant representations without rewards, achieving superior performance over baselines in new benchmarks.

Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenous information, i.e, any control-irrelevant information contained in observations. For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information, and introduce new offline RL benchmarks offering the ability to study this problem. We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications. To address these, we propose to use multi-step inverse models, which have seen a great deal of interest in the RL theory community, to learn Agent-Controller Representations for Offline-RL (ACRO). Despite being simple and requiring no reward, we show theoretically and empirically that the representation created by this objective greatly outperforms baselines.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes