Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
This work addresses optimization challenges in machine learning when data augmentation introduces stochastic perturbations, offering improved efficiency for training models with augmented datasets.
The paper tackles the problem of stochastic optimization with data augmentation, where the objective is no longer a finite sum, by introducing a variance reduction approach for composite and strongly convex objectives, achieving a convergence rate that outperforms SGD with a smaller constant factor dependent on single-example perturbation variance.
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.