High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise
This work addresses finite-time convergence guarantees for SGD in machine learning models with Markovian noise, which is incremental by extending existing analyses to broader noise conditions.
The paper presents the first uniform-in-time high-probability bound for Stochastic Gradient Descent (SGD) under the Polyak-Lojasiewicz (PL) condition with Markovian noise, achieving a matching 1/k decay rate for expected suboptimality. It demonstrates applicability in decentralized linear regression, supervised learning with privacy, and online system identification.
We present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.