MLLGOCMay 4, 2021

Implicit differentiation for fast hyperparameter selection in non-smooth convex learning

arXiv:2105.01637v338 citations
Originality Incremental advance
AI Analysis

This provides a faster method for hyperparameter selection in machine learning models with non-smooth convex objectives, though it appears incremental as it builds on existing bilevel optimization and proximal methods.

The paper tackles hyperparameter optimization in non-smooth convex learning by developing first-order methods using implicit differentiation, showing that forward-mode differentiation of proximal algorithms yields converging Jacobians and leveraging non-smoothness for faster computation. Results on regression and classification demonstrate computational benefits, particularly with multiple hyperparameters.

Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian. Using implicit differentiation, we show it is possible to leverage the non-smoothness of the inner problem to speed up the computation. Finally, we provide a bound on the error made on the hypergradient when the inner optimization problem is solved approximately. Results on regression and classification problems reveal computational benefits for hyperparameter optimization, especially when multiple hyperparameters are required.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes