LG AIMay 11

When Does Non-Uniform Replay Matter in Reinforcement Learning?

Michal Korniak, Mikołaj Czarnecki, Yarden As, Piotr Miłoś, Pieter Abbeel, Michal Nauman

arXiv:2605.1023661.2

Predicted impact top 36% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work provides practical guidance for replay buffer design in off-policy RL, clarifying conditions under which non-uniform replay is beneficial.

The paper investigates when non-uniform replay sampling improves over uniform replay in off-policy reinforcement learning, identifying replay volume, expected recency, and sampling entropy as key factors. It proposes Truncated Geometric replay, which improves sample efficiency in low-volume regimes while remaining competitive at high volume across multiple benchmarks.

Modern off-policy reinforcement learning algorithms often rely on simple uniform replay sampling and it remains unclear when and why non-uniform replay improves over this strong baseline. Across diverse RL settings, we show that the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. Our main contribution is clarifying when non-uniform replay is beneficial and providing practical guidance for replay design in modern off-policy RL. Namely, we find that non-uniform replay is most beneficial when replay volume is low, and that high-entropy sampling is important even at comparable expected recency. Motivated by these findings, we adopt a simple Truncated Geometric replay that biases sampling toward recent experience while preserving high entropy and incurring negligible computational overhead. Across large-scale parallel simulation, single-task, and multi-task settings, including three modern algorithms evaluated on five RL benchmark suites, this replay sampling strategy improves sample efficiency in low-volume regimes while remaining competitive when replay volume is high.

View on arXiv PDF

Similar