LGDSSTMLJul 14, 2023

Performance of $\ell_1$ Regularization for Sparse Convex Optimization

arXiv:2307.07405v11 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses a foundational problem in machine learning by providing theoretical justification for widely used regularization methods in sparse optimization, though it is incremental as it builds on existing weak submodularity arguments.

The paper tackles the lack of theoretical guarantees for Group LASSO in sparse convex optimization beyond statistical settings, showing that with sufficient regularization, its minimizer matches the feature selection of Orthogonal Matching Pursuit under restricted strong convexity and smoothness conditions. This result provides the first theoretical explanation for its empirical success and generalizes guarantees to the column subset selection problem for general loss functions.

Despite widespread adoption in practice, guarantees for the LASSO and Group LASSO are strikingly lacking in settings beyond statistical problems, and these algorithms are usually considered to be a heuristic in the context of sparse convex optimization on deterministic inputs. We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features. We show that if a sufficiently large Group LASSO regularization is applied when minimizing a strictly convex function $l$, then the minimizer is a sparse vector supported on vector-valued features with the largest $\ell_2$ norm of the gradient. Thus, repeating this procedure selects the same set of features as the Orthogonal Matching Pursuit algorithm, which admits recovery guarantees for any function $l$ with restricted strong convexity and smoothness via weak submodularity arguments. This answers open questions of Tibshirani et al. and Yasuda et al. Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances assuming only restricted strong convexity and smoothness. Our result also generalizes provable guarantees for the Sequential Attention algorithm, which is a feature selection algorithm inspired by the attention mechanism proposed by Yasuda et al. As an application of our result, we give new results for the column subset selection problem, which is well-studied when the loss is the Frobenius norm or other entrywise matrix losses. We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes