SGD with Variance Reduction beyond Empirical Risk Minimization
This work addresses computational bottlenecks in survival analysis and similar settings by accelerating optimization for practitioners dealing with expensive expectations.
The paper tackles the problem of optimizing functions with expensive gradient expectations, such as the regularized Cox partial-likelihood in survival analysis, by introducing a doubly stochastic proximal gradient algorithm that combines SGD with variance reduction and MCMC approximations, achieving linear convergence under strong convexity and improving state-of-the-art solvers on several datasets.
We introduce a doubly stochastic proximal gradient algorithm for optimizing a finite average of smooth convex functions, whose gradients depend on numerically expensive expectations. Our main motivation is the acceleration of the optimization of the regularized Cox partial-likelihood (the core model used in survival analysis), but our algorithm can be used in different settings as well. The proposed algorithm is doubly stochastic in the sense that gradient steps are done using stochastic gradient descent (SGD) with variance reduction, where the inner expectations are approximated by a Monte-Carlo Markov-Chain (MCMC) algorithm. We derive conditions on the MCMC number of iterations guaranteeing convergence, and obtain a linear rate of convergence under strong convexity and a sublinear rate without this assumption. We illustrate the fact that our algorithm improves the state-of-the-art solver for regularized Cox partial-likelihood on several datasets from survival analysis.