OC LG PR ST MLDec 23, 2021

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett

arXiv:2112.12770v217.231 citations

Originality Incremental advance

AI Analysis

This work offers theoretical guarantees for stochastic approximation in reinforcement learning and time series, with potential impact on hyperparameter tuning, though it is incremental in refining existing bounds.

The paper tackles the problem of analyzing stochastic approximation algorithms for solving linear fixed point equations using data from ergodic Markov chains, providing non-asymptotic bounds on error and establishing instance-optimality for averaged estimators. It achieves results with sharp dependence on parameters like dimension and mixing time, and applies these to policy evaluation and autoregressive models.

We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD($λ$) family of algorithms for all $λ\in [0, 1)$ -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $λ$ when running the TD($λ$) algorithm).

View on arXiv PDF

Similar