MLLGFeb 1, 2025

Variance Reduction via Resampling and Experience Replay

arXiv:2502.00520v2h-index: 2
AI Analysis

This work addresses the theoretical gap in experience replay for reinforcement learning, offering foundational insights with broad applicability across machine learning tasks.

The paper tackles the theoretical underexploration of experience replay in reinforcement learning by modeling it with resampled U- and V-statistics, providing rigorous variance reduction guarantees. It demonstrates significant improvements in stability and efficiency for policy evaluation tasks and reduces computational cost from O(n^3) to O(n^2) in kernel ridge regression.

Experience replay is a foundational technique in reinforcement learning that enhances learning stability by storing past experiences in a replay buffer and reusing them during training. Despite its practical success, its theoretical properties remain underexplored. In this paper, we present a theoretical framework that models experience replay using resampled $U$- and $V$-statistics, providing rigorous variance reduction guarantees. We apply this framework to policy evaluation tasks using the Least-Squares Temporal Difference (LSTD) algorithm and a Partial Differential Equation (PDE)-based model-free algorithm, demonstrating significant improvements in stability and efficiency, particularly in data-scarce scenarios. Beyond policy evaluation, we extend the framework to kernel ridge regression, showing that the experience replay-based method reduces the computational cost from the traditional $O(n^3)$ in time to as low as $O(n^2)$ in time while simultaneously reducing variance. Extensive numerical experiments validate our theoretical findings, demonstrating the broad applicability and effectiveness of experience replay in diverse machine learning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes