LGAIMar 20, 2022

Robust Action Gap Increasing with Clipped Advantage Learning

arXiv:2203.11677v13 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses a robustness problem in reinforcement learning for practitioners, but it is incremental as it builds on existing Advantage Learning methods.

The paper tackles the issue in Advantage Learning where increasing the action gap can lead to errors when the approximated optimal action differs from the true one, by proposing clipped Advantage Learning to adaptively adjust advantages, resulting in improved convergence and performance on RL benchmarks.

Advantage Learning (AL) seeks to increase the action gap between the optimal action and its competitors, so as to improve the robustness to estimation errors. However, the method becomes problematic when the optimal action induced by the approximated value function does not agree with the true optimal action. In this paper, we present a novel method, named clipped Advantage Learning (clipped AL), to address this issue. The method is inspired by our observation that increasing the action gap blindly for all given samples while not taking their necessities into account could accumulate more errors in the performance loss bound, leading to a slow value convergence, and to avoid that, we should adjust the advantage value adaptively. We show that our simple clipped AL operator not only enjoys fast convergence guarantee but also retains proper action gaps, hence achieving a good balance between the large action gap and the fast convergence. The feasibility and effectiveness of the proposed method are verified empirically on several RL benchmarks with promising performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes