LGAIFeb 23, 2022

Consistent Dropout for Policy Gradient Reinforcement Learning

arXiv:2202.11818v114 citations
Originality Incremental advance
AI Analysis

This addresses a specific technical bottleneck for RL practitioners by enabling stable use of dropout in policy-gradient methods, though it is incremental as it adapts an existing technique.

The paper tackled the instability of naive dropout in policy-gradient reinforcement learning by introducing consistent dropout, which enabled stable training across various environments and architectures, including GPT, without disabling native dropout.

Dropout has long been a staple of supervised learning, but is rarely used in reinforcement learning. We analyze why naive application of dropout is problematic for policy-gradient learning algorithms and introduce consistent dropout, a simple technique to address this instability. We demonstrate consistent dropout enables stable training with A2C and PPO in both continuous and discrete action environments across a wide range of dropout probabilities. Finally, we show that consistent dropout enables the online training of complex architectures such as GPT without needing to disable the model's native dropout.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes