LG AIFeb 23, 2022

Consistent Dropout for Policy Gradient Reinforcement Learning

arXiv:2202.11818v18.714 citations

Originality Incremental advance

AI Analysis

This addresses a specific technical bottleneck for RL practitioners by enabling stable use of dropout in policy-gradient methods, though it is incremental as it adapts an existing technique.

The paper tackled the instability of naive dropout in policy-gradient reinforcement learning by introducing consistent dropout, which enabled stable training across various environments and architectures, including GPT, without disabling native dropout.

Dropout has long been a staple of supervised learning, but is rarely used in reinforcement learning. We analyze why naive application of dropout is problematic for policy-gradient learning algorithms and introduce consistent dropout, a simple technique to address this instability. We demonstrate consistent dropout enables stable training with A2C and PPO in both continuous and discrete action environments across a wide range of dropout probabilities. Finally, we show that consistent dropout enables the online training of complex architectures such as GPT without needing to disable the model's native dropout.

View on arXiv PDF

Similar