LGAIMEMLDec 28, 2022

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

DeepMind
arXiv:2212.13936v129 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses stability and efficiency issues in reinforcement learning for robotics, though it is incremental as it builds on existing KL-regularized methods.

The paper identifies pathological training dynamics in KL-regularized reinforcement learning with expert demonstrations, leading to slow and suboptimal learning, and proposes a remedy using non-parametric policies that outperforms state-of-the-art methods on locomotion and manipulation tasks.

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes