SYLGOCOct 16, 2024

Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way

arXiv:2410.13067v15 citationsh-index: 4AISTATS
Originality Incremental advance
AI Analysis

This provides theoretical insights for optimization algorithms in machine learning, but it is incremental as it extends prior work on constant stepsizes.

The paper tackled the analysis of two-timescale stochastic approximation with constant stepsizes, proving convergence to a stationary distribution and deriving explicit rates, biases scaling as Θ(α)+Θ(β), and variances scaling as O(α) and O(β), leading to improved mean-squared error bounds of O(β^4 + 1/t).

Previous studies on two-timescale stochastic approximation (SA) mainly focused on bounding mean-squared errors under diminishing stepsize schemes. In this work, we investigate {\it constant} stpesize schemes through the lens of Markov processes, proving that the iterates of both timescales converge to a unique joint stationary distribution in Wasserstein metric. We derive explicit geometric and non-asymptotic convergence rates, as well as the variance and bias introduced by constant stepsizes in the presence of Markovian noise. Specifically, with two constant stepsizes $α< β$, we show that the biases scale linearly with both stepsizes as $Θ(α)+Θ(β)$ up to higher-order terms, while the variance of the slower iterate (resp., faster iterate) scales only with its own stepsize as $O(α)$ (resp., $O(β)$). Unlike previous work, our results require no additional assumptions such as $β^2 \ll α$ nor extra dependence on dimensions. These fine-grained characterizations allow tail-averaging and extrapolation techniques to reduce variance and bias, improving mean-squared error bound to $O(β^4 + \frac{1}{t})$ for both iterates.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes