MLLGMay 7

Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $τ$-Mixing

arXiv:2605.0637312.6
Predicted impact top 82% in ML · last 90 daysOriginality Incremental advance
AI Analysis

For reinforcement learning theorists, this work relaxes the unrealistic independence assumption in DQN analysis, offering rigorous finite-sample bounds under explicit temporal dependence.

This paper provides finite-sample guarantees for deep Q-learning under temporal dependence by modeling minibatches as τ-mixing, showing that dependence degrades the statistical rate via an additional dimensionality penalty and deriving sample complexity bounds.

Finite-sample analyses of deep Q-learning typically treat replayed data as independent, even though it is sampled from temporally dependent state-action trajectories. We study the Deep Q-networks (DQN) algorithm under explicit dependence by modelling the minibatches used for updating the network as $τ$-mixing. We show that this assumption holds under certain dependence conditions on the underlying trajectories and the mechanism used to sample minibatches. Building on this observation, we extend statistical analyses of DQN with fully connected ReLU architectures to dependent data. We formulate each update as a nonparametric regression problem with $τ$-mixing observations and derive finite-sample risk bounds under this dependence structure. Our results show that temporal dependence leads to a degradation in the statistical rate by inducing an additional dimensionality penalty in the rate exponent, reflecting the reduced effective sample size of $τ$-mixing data. Moreover, we derive the sample complexity of DQN under $tau$-mixing from these risk bounds. Finally, we empirically demonstrate on standard Gymnasium environments that the independence assumption is systematically violated and that replay sampling yields approximately exponentially decaying correlations, supporting our theoretical framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes