ML LGMay 17

On Gaussian approximation for entropy-regularized Q-learning with function approximation

Artemy Rubtsov, Rahul Singh, Eric Moulines, Alexey Naumov, Sergey Samsonov

arXiv:2605.1767854.9

Predicted impact top 22% in ML · last 90 daysOriginality Synthesis-oriented

AI Analysis

Provides theoretical guarantees for the asymptotic distribution of Q-learning iterates, relevant for statistical inference in reinforcement learning.

The paper derives a Gaussian approximation bound for Polyak-Ruppert averaged iterates in entropy-regularized Q-learning with linear function approximation, achieving a convergence rate of order n^{-1/4} up to polylog factors.

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize $k^{-ω}$, $ω\in (1/2,1)$. Assuming that the sequence of observed triples $(s_k,a_k,s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, and under suitable regularity conditions for the projected soft Bellman equation, we establish a Gaussian approximation bound in the convex distance with rate of order $n^{-1/4}$, up to polylogarithmic factors in $n$, where $n$ is the number of samples used by the algorithm. To obtain this result, we combine a linearization of the soft Bellman recursion with a Gaussian approximation for the leading martingale term. Finally, we derive high-order moment bounds for the algorithm's last iterate, which might be of independent interest.

View on arXiv PDF

Similar