Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence
This work addresses a foundational issue in stochastic optimization for researchers and practitioners, providing a theoretical tool to analyze algorithms that may not converge under constant stepsize, though it is incremental in building on existing operator theory.
The paper tackles the problem of analyzing convergence for constant stepsize recursive stochastic algorithms by introducing a unified framework based on iterated random operator theory and a new Wasserstein divergence concept, showing that if the distribution of iterates satisfies a contraction property, the Markov chain admits an invariant distribution, enabling convergence analysis for a large family of such algorithms.
This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs). RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea of our analysis is to lift the RSA into an appropriate higher-dimensional space and then express it as an equivalent Markov chain. Instead of determining the convergence of this Markov chain (which may not converge under constant stepsize), we study the convergence of the distribution of this Markov chain. To study this, we define a new notion of Wasserstein divergence. We show that if the distribution of the iterates in the Markov chain satisfy a contraction property with respect to the Wasserstein divergence, then the Markov chain admits an invariant distribution. We show that convergence of a large family of constant stepsize RSAs can be understood using this framework, and we provide several detailed examples.