Self-concordant smoothing in proximal quasi-Newton algorithms for large-scale convex composite optimization
This work addresses optimization challenges in machine learning for problems with nonsmooth penalties, offering incremental improvements in algorithm design for researchers and practitioners.
The paper tackles large-scale convex composite optimization by introducing self-concordant smoothing to handle nonsmooth terms like l1-regularization, resulting in proximal quasi-Newton algorithms (Prox-N-SCORE and Prox-GGN-SCORE) that demonstrate efficiency against state-of-the-art approaches in numerical experiments on synthetic and real data.
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other nonsmooth. The key highlight is a natural property of the resulting problem's structure that yields a variable metric selection method and a step length rule especially suited to proximal quasi-Newton algorithms. Also, we efficiently handle specific structures promoted by the nonsmooth term, such as l1-regularization and group lasso penalties. A convergence analysis for the class of proximal quasi-Newton methods covered by our framework is presented. In particular, we obtain guarantees, under standard assumptions, for two algorithms: Prox-N-SCORE (a proximal Newton method) and Prox-GGN-SCORE (a proximal generalized Gauss-Newton method). The latter uses a low-rank approximation of the Hessian inverse, reducing most of the cost of matrix inversion and making it effective for overparameterized machine learning models. Numerical experiments on synthetic and real data demonstrate the efficiency of both algorithms against state-of-the-art approaches. A Julia implementation is publicly available at https://github.com/adeyemiadeoye/SelfConcordantSmoothOptimization.jl.