Stochastic Halpern iteration in normed spaces and applications to reinforcement learning
This work addresses computational efficiency challenges in reinforcement learning by providing improved convergence guarantees for fixed-point methods, with applications to model-free algorithms for Markov decision processes, though it is incremental relative to existing iterative methods.
The paper tackles the problem of approximating fixed-points of nonexpansive and contractive operators in normed spaces using stochastic Halpern iteration with minibatch, achieving an oracle complexity of $ ilde{O}(\varepsilon^{-5})$ for nonexpansive operators, improving upon recent rates, and establishing a lower bound of $\Omega(\varepsilon^{-3})$ for a broad class of algorithms.
We analyze the oracle complexity of the stochastic Halpern iteration with minibatch, where we aim to approximate fixed-points of nonexpansive and contractive operators in a normed finite-dimensional space. We show that if the underlying stochastic oracle has uniformly bounded variance, our method exhibits an overall oracle complexity of $\tilde{O}(\varepsilon^{-5})$, to obtain $\varepsilon$ expected fixed-point residual for nonexpansive operators, improving recent rates established for the stochastic Krasnoselskii-Mann iteration. Also, we establish a lower bound of $Ω(\varepsilon^{-3})$ which applies to a wide range of algorithms, including all averaged iterations even with minibatching. Using a suitable modification of our approach, we derive a $O(\varepsilon^{-2}(1-γ)^{-3})$ complexity bound in the case in which the operator is a $γ$-contraction to obtain an approximation of the fixed-point. As an application, we propose new model-free algorithms for average and discounted reward MDPs. For the average reward case, our method applies to weakly communicating MDPs without requiring prior parameter knowledge.