Variance Regularization for Accelerating Stochastic Optimization
This work addresses a general issue in stochastic optimization for machine learning practitioners, offering an incremental improvement to existing first-order methods.
The paper tackles the problem of random error accumulation in stochastic gradient-based optimization by proposing variance regularization of learning rates using mini-batch statistics, which accelerates convergence and stabilizes the process, as demonstrated empirically.
While nowadays most gradient-based optimization methods focus on exploring the high-dimensional geometric features, the random error accumulated in a stochastic version of any algorithm implementation has not been stressed yet. In this work, we propose a universal principle which reduces the random error accumulation by exploiting statistic information hidden in mini-batch gradients. This is achieved by regularizing the learning-rate according to mini-batch variances. Due to the complementarity of our perspective, this regularization could provide a further improvement for stochastic implementation of generic 1st order approaches. With empirical results, we demonstrated the variance regularization could speed up the convergence as well as stabilize the stochastic optimization.