LGAISep 13, 2023

Attention Loss Adjusted Prioritized Experience Replay

arXiv:2309.06684v24 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses a specific technical bottleneck in deep reinforcement learning for researchers and practitioners, offering an incremental improvement over existing PER methods.

The paper tackles the estimation error in Q-value functions caused by non-uniform sampling in Prioritized Experience Replay (PER) by proposing the ALAP algorithm, which integrates a Self-Attention network and Double-Sampling mechanism to adjust importance sampling weights, showing effectiveness across various reinforcement learning algorithms in tests.

Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes