ML LG OCJun 2, 2021

Smooth Bilevel Programming for Sparse Regularization

arXiv:2106.01429v214.422 citationsHas Code

Originality Incremental advance

AI Analysis

This provides a versatile and efficient solver for machine learning practitioners dealing with sparse regularization, though it is incremental as it builds on existing IRLS methods.

The paper tackles the problem of solving sparsity-enforcing regression problems like Lasso and group Lasso by proposing a smooth bilevel programming method based on a reparametrization of iteratively reweighted least squares, achieving top performance across various regularization types and design matrices with efficient convergence in numerical benchmarks.

Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. State of the art approaches are more efficient but typically rely on specific coordinate pruning schemes. In this work, we show how a surprisingly simple reparametrization of IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is able to achieve top performances on a wide range of sparsity (such as Lasso, group Lasso and trace norm regularizations), regularization strength (including hard constraints), and design matrices (ranging from correlated designs to differential operators). Similarly to IRLS, our method only involves linear systems resolutions, but in sharp contrast, corresponds to the minimization of a smooth function. Despite being non-convex, we show that there is no spurious minima and that saddle points are "ridable", so that there always exists a descent direction. We thus advocate for the use of a BFGS quasi-Newton solver, which makes our approach simple, robust and efficient. We perform a numerical benchmark of the convergence speed of our algorithm against state of the art solvers for Lasso, group Lasso, trace norm and linearly constrained problems. These results highlight the versatility of our approach, removing the need to use different solvers depending on the specificity of the ML problem under study.

View on arXiv PDF Code

Similar