LGMar 20, 2022

Smoothing Advantage Learning

arXiv:2203.10445v13 citationsh-index: 6
AI Analysis

This work addresses a stability problem in reinforcement learning for practitioners, but it is incremental as it modifies an existing method.

The paper tackles the instability of advantage learning (AL) in reinforcement learning with function approximation by proposing smoothing advantage learning (SAL), which replaces the Bellman optimal operator with a smooth one to stabilize training and increase the action gap, resulting in improved robustness and performance bounds.

Advantage learning (AL) aims to improve the robustness of value-based reinforcement learning against estimation errors with action-gap-based regularization. Unfortunately, the method tends to be unstable in the case of function approximation. In this paper, we propose a simple variant of AL, named smoothing advantage learning (SAL), to alleviate this problem. The key to our method is to replace the original Bellman Optimal operator in AL with a smooth one so as to obtain more reliable estimation of the temporal difference target. We give a detailed account of the resulting action gap and the performance bound for approximate SAL. Further theoretical analysis reveals that the proposed value smoothing technique not only helps to stabilize the training procedure of AL by controlling the trade-off between convergence rate and the upper bound of the approximation errors, but is beneficial to increase the action gap between the optimal and sub-optimal action value as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes