LG MLJun 23, 2025

Reliability-Adjusted Prioritized Experience Replay

Leonard S. Pleiss, Tobias Sutter, Maximilian Schiffer

arXiv:2506.18482v23 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses data efficiency in reinforcement learning agents, but it is incremental as it extends an existing method (PER) with a reliability adjustment.

The paper tackled the problem of inefficient sampling in experience replay for reinforcement learning by proposing Reliability-adjusted Prioritized Experience Replay (ReaPER), which introduces a measure of temporal difference error reliability to improve upon Prioritized Experience Replay (PER), and it showed empirical outperformance across environments including the Atari-10 benchmark.

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.

View on arXiv PDF

Similar