LGAIMLAug 25, 2020

Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

arXiv:2008.10806v13 citations
Originality Incremental advance
AI Analysis

This addresses policy instability in reinforcement learning for applications like robotics, though it is incremental as it builds on existing entropy-regularization methods.

The paper tackled the problem of ensuring monotonic policy improvement in entropy-regularized value-based reinforcement learning by deriving a new lower bound that scales to large state spaces, resulting in a novel algorithm that alleviates policy oscillation and is demonstrated effective in discrete and continuous tasks.

This paper aims to establish an entropy-regularized value-based reinforcement learning method that can ensure the monotonic improvement of policies at each policy update. Unlike previously proposed lower-bounds on policy improvement in general infinite-horizon MDPs, we derive an entropy-regularization aware lower bound. Since our bound only requires the expected policy advantage function to be estimated, it is scalable to large-scale (continuous) state-space problems. We propose a novel reinforcement learning algorithm that exploits this lower-bound as a criterion for adjusting the degree of a policy update for alleviating policy oscillation. We demonstrate the effectiveness of our approach in both discrete-state maze and continuous-state inverted pendulum tasks using a linear function approximator for value estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes