MLFeb 10, 2025
Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient DescentMarina Sheshukova, Sergey Samsonov, Denis Belomestny et al.
In this paper, we establish the non-asymptotic validity of the multiplier bootstrap procedure for constructing the confidence sets using the Stochastic Gradient Descent (SGD) algorithm. Under appropriate regularity conditions, our approach avoids the need to approximate the limiting covariance of Polyak-Ruppert SGD iterates, which allows us to derive approximation rates in convex distance of order up to $1/\sqrt{n}$. Notably, this rate can be faster than the one that can be proven in the Polyak-Juditsky central limit theorem. To our knowledge, this provides the first fully non-asymptotic bound on the accuracy of bootstrap approximations in SGD algorithms. Our analysis builds on the Gaussian approximation results for nonlinear statistics of independent random variables.
MLMay 25, 2025
Statistical inference for Linear Stochastic Approximation with Markovian NoiseSergey Samsonov, Marina Sheshukova, Eric Moulines et al.
In this paper we derive non-asymptotic Berry-Esseen bounds for Polyak-Ruppert averaged iterates of the Linear Stochastic Approximation (LSA) algorithm driven by the Markovian noise. Our analysis yields $\mathcal{O}(n^{-1/4})$ convergence rates to the Gaussian limit in the Kolmogorov distance. We further establish the non-asymptotic validity of a multiplier block bootstrap procedure for constructing the confidence intervals, guaranteeing consistent inference under Markovian sampling. Our work provides the first non-asymptotic guarantees on the rate of convergence of bootstrap-based confidence intervals for stochastic approximation with Markov noise. Moreover, we recover the classical rate of order $\mathcal{O}(n^{-1/8})$ up to logarithmic factors for estimating the asymptotic variance of the iterates of the LSA algorithm.
LGOct 22, 2025
Scalable LinUCB: Low-Rank Design Matrix Updates for Recommenders with Large Action SpacesEvgenia Shustova, Marina Sheshukova, Sergey Samsonov et al.
Linear contextual bandits, especially LinUCB, are widely used in recommender systems. However, its training, inference, and memory costs grow with feature dimensionality and the size of the action space. The key bottleneck becomes the need to update, invert and store a design matrix that absorbs contextual information from interaction history. In this paper, we introduce Scalable LinUCB, the algorithm that enables fast and memory efficient operations with the inverse regularized design matrix. We achieve this through a dynamical low-rank parametrization of its inverse Cholesky-style factors. We derive numerically stable rank-1 and batched updates that maintain the inverse without directly forming the entire matrix. To control memory growth, we employ a projector-splitting integrator for dynamical low-rank approximation, yielding average per-step update cost $O(dr)$ and memory $O(dr)$ for approximation rank $r$. Inference complexity of the suggested algorithm is $O(dr)$ per action evaluation. Experiments on recommender system datasets demonstrate the effectiveness of our algorithm.
MLSep 17, 2025
On the Rate of Gaussian Approximation for Linear Regression ProblemsMarat Khusainov, Marina Sheshukova, Alain Durmus et al.
In this paper, we consider the problem of Gaussian approximation for the online linear regression task. We derive the corresponding rates for the setting of a constant learning rate and study the explicit dependence of the convergence rate upon the problem dimension $d$ and quantities related to the design matrix. When the number of iterations $n$ is known in advance, our results yield the rate of normal approximation of order $\sqrt{\log{n}/n}$, provided that the sample size $n$ is large enough.
OCMay 25, 2023
First Order Methods with Markovian Noise: from Acceleration to Variational InequalitiesAleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova et al.
This paper delves into stochastic optimization problems that involve Markovian noise. We present a unified approach for the theoretical analysis of first-order gradient methods for stochastic optimization and variational inequalities. Our approach covers scenarios for both non-convex and strongly convex minimization problems. To achieve an optimal (linear) dependence on the mixing time of the underlying noise sequence, we use the randomized batching scheme, which is based on the multilevel Monte Carlo method. Moreover, our technique allows us to eliminate the limiting assumptions of previous research on Markov noise, such as the need for a bounded domain and uniformly bounded stochastic gradients. Our extension to variational inequalities under Markovian noise is original. Additionally, we provide lower bounds that match the oracle complexity of our method in the case of strongly convex optimization problems.