ML LGJan 27, 2025

SAPPHIRE: Preconditioned Stochastic Variance Reduction for Faster Large-Scale Statistical Learning

Jingruo Sun, Zachary Frangella, Madeleine Udell

arXiv:2501.15941v110.33 citationsh-index: 7

Originality Highly original

AI Analysis

This provides a faster and more scalable solution for data-intensive fields such as genomics and advertising, though it appears incremental as it builds on stochastic variance reduction methods.

The paper tackles the problem of slow convergence in large-scale statistical learning due to ill-conditioned objectives and non-smooth regularizers by proposing the SAPPHIRE algorithm, which achieves condition-number-free linear convergence and converges up to 20 times faster than existing methods like Catalyst, SAGA, and SVRG in experiments on lasso and logistic regression.

Regularized empirical risk minimization (rERM) has become important in data-intensive fields such as genomics and advertising, with stochastic gradient methods typically used to solve the largest problems. However, ill-conditioned objectives and non-smooth regularizers undermine the performance of traditional stochastic gradient methods, leading to slow convergence and significant computational costs. To address these challenges, we propose the $\texttt{SAPPHIRE}$ ($\textbf{S}$ketching-based $\textbf{A}$pproximations for $\textbf{P}$roximal $\textbf{P}$reconditioning and $\textbf{H}$essian $\textbf{I}$nexactness with Variance-$\textbf{RE}$educed Gradients) algorithm, which integrates sketch-based preconditioning to tackle ill-conditioning and uses a scaled proximal mapping to minimize the non-smooth regularizer. This stochastic variance-reduced algorithm achieves condition-number-free linear convergence to the optimum, delivering an efficient and scalable solution for ill-conditioned composite large-scale convex machine learning problems. Extensive experiments on lasso and logistic regression demonstrate that $\texttt{SAPPHIRE}$ often converges $20$ times faster than other common choices such as $\texttt{Catalyst}$, $\texttt{SAGA}$, and $\texttt{SVRG}$. This advantage persists even when the objective is non-convex or the preconditioner is infrequently updated, highlighting its robust and practical effectiveness.

View on arXiv PDF

Similar