LGAIJun 1, 2021

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

arXiv:2106.00707v18 citations
Originality Incremental advance
AI Analysis

This addresses a key limitation in policy-based RL methods, offering a novel mechanism to improve sample efficiency and performance, though it appears incremental as it adapts ideas from value-based methods.

The paper tackles the policy collapse problem in policy-based reinforcement learning by proposing an entropy regularization free mechanism that achieves Closed-form Diversity, Objective-invariant Exploration, and Adaptive Trade-off, boosting a policy-based baseline to a new State-Of-The-Art on the Arcade Learning Environment.

Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with ε-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes