Experience Replay with Random Reshuffling
This work addresses the challenge of stabilizing learning and boosting sample efficiency for reinforcement learning practitioners, though it is incremental as it adapts a known technique from supervised learning.
The paper tackled the problem of improving experience replay in reinforcement learning by introducing random reshuffling methods, which led to enhanced convergence and performance on Atari benchmarks.
Experience replay is a key component in reinforcement learning for stabilizing learning and improving sample efficiency. Its typical implementation samples transitions with replacement from a replay buffer. In contrast, in supervised learning with a fixed dataset, it is a common practice to shuffle the dataset every epoch and consume data sequentially, which is called random reshuffling (RR). RR enjoys theoretically better convergence properties and has been shown to outperform with-replacement sampling empirically. To leverage the benefits of RR in reinforcement learning, we propose sampling methods that extend RR to experience replay, both in uniform and prioritized settings. We evaluate our sampling methods on Atari benchmarks, demonstrating their effectiveness in deep reinforcement learning.