LG AI MLJan 25, 2025

Divergence-Augmented Policy Optimization

Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang

arXiv:2501.15034v122.619 citationsh-index: 23Has CodeNIPS

Originality Incremental advance

AI Analysis

This addresses a key challenge in reinforcement learning for practitioners dealing with limited data, though it is an incremental improvement on existing methods.

The paper tackles the problem of instability and premature convergence in deep reinforcement learning when reusing off-policy data, by introducing a Bregman divergence method that ensures small, safe policy updates, achieving better performance than state-of-the-art algorithms on Atari games in data-scarce scenarios.

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.

View on arXiv PDF Code

Similar