Stochastic optimization with arbitrary recurrent data sampling
This work provides a theoretical foundation for accelerating convergence in stochastic optimization by optimizing data sampling strategies, with applications in decentralized optimization and matrix factorization.
The authors tackled the problem of achieving optimal first-order convergence in stochastic optimization by showing that only recurrence in data sampling is necessary, not additional properties like independence. They demonstrated that for non-convex and possibly non-smooth functions, the expected optimality gap converges at an optimal rate of O(n^{-1/2}) under general recurrent sampling schemes, with explicit dependence on recurrence speed.
For obtaining optimal first-order convergence guarantee for stochastic optimization, it is necessary to use a recurrent data sampling algorithm that samples every data point with sufficient frequency. Most commonly used data sampling algorithms (e.g., i.i.d., MCMC, random reshuffling) are indeed recurrent under mild assumptions. In this work, we show that for a particular class of stochastic optimization algorithms, we do not need any other property (e.g., independence, exponential mixing, and reshuffling) than recurrence in data sampling algorithms to guarantee the optimal rate of first-order convergence. Namely, using regularized versions of Minimization by Incremental Surrogate Optimization (MISO), we show that for non-convex and possibly non-smooth objective functions, the expected optimality gap converges at an optimal rate $O(n^{-1/2})$ under general recurrent sampling schemes. Furthermore, the implied constant depends explicitly on the `speed of recurrence', measured by the expected amount of time to visit a given data point either averaged (`target time') or supremized (`hitting time') over the current location. We demonstrate theoretically and empirically that convergence can be accelerated by selecting sampling algorithms that cover the data set most effectively. We discuss applications of our general framework to decentralized optimization and distributed non-negative matrix factorization.