LGMay 17, 2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

arXiv:2105.08140v1225 citations
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in offline RL for applications like robotics or autonomous systems, though it is an incremental improvement over existing actor-critic methods.

The paper tackles the problem of offline reinforcement learning failing due to out-of-distribution actions or states by proposing Uncertainty Weighted Actor-Critic (UWAC), which detects and down-weights such pairs, resulting in improved stability and outperforming existing methods on competitive tasks with significant gains on sparse human-expert datasets.

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes