LGMay 17, 2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh

arXiv:2105.08140v132.3225 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a key bottleneck in offline RL for applications like robotics or autonomous systems, though it is an incremental improvement over existing actor-critic methods.

The paper tackles the problem of offline reinforcement learning failing due to out-of-distribution actions or states by proposing Uncertainty Weighted Actor-Critic (UWAC), which detects and down-weights such pairs, resulting in improved stability and outperforming existing methods on competitive tasks with significant gains on sparse human-expert datasets.

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

View on arXiv PDF Code

Similar