MLLGMay 17

On Gaussian approximation for entropy-regularized Q-learning with function approximation

arXiv:2605.1767854.9
Predicted impact top 22% in ML · last 90 daysOriginality Synthesis-oriented
AI Analysis

Provides theoretical guarantees for the asymptotic distribution of Q-learning iterates, relevant for statistical inference in reinforcement learning.

The paper derives a Gaussian approximation bound for Polyak-Ruppert averaged iterates in entropy-regularized Q-learning with linear function approximation, achieving a convergence rate of order n^{-1/4} up to polylog factors.

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize $k^{-ω}$, $ω\in (1/2,1)$. Assuming that the sequence of observed triples $(s_k,a_k,s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, and under suitable regularity conditions for the projected soft Bellman equation, we establish a Gaussian approximation bound in the convex distance with rate of order $n^{-1/4}$, up to polylogarithmic factors in $n$, where $n$ is the number of samples used by the algorithm. To obtain this result, we combine a linearization of the soft Bellman recursion with a Gaussian approximation for the leading martingale term. Finally, we derive high-order moment bounds for the algorithm's last iterate, which might be of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes