LGMLMay 18, 2020

Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

arXiv:2005.08844v2
AI Analysis

This work addresses a foundational challenge in reinforcement learning by providing a unified framework, though it appears incremental as it builds on existing entropy-based methods.

The paper tackles the problem of connecting policy gradient and Q-learning methods in reinforcement learning by introducing an entropy-augmented objective with KL-divergence regularization, resulting in a continuous algorithm that interpolates between these extremes and shows performance gains in experiments.

Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy which monotonically improves while interpolating from the current policy to the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes