MLLGOTOct 20, 2019

Improved error rates for sparse (group) learning with Lipschitz loss functions

arXiv:1910.08880v73 citations
Originality Incremental advance
AI Analysis

This work provides theoretical guarantees for sparse learning methods, which is important for researchers in high-dimensional statistics and machine learning, though it is incremental as it builds on existing frameworks to unify and improve bounds.

The paper tackles the problem of deriving high-dimensional L2 estimation error bounds for sparse and group-sparse estimators with Lipschitz loss functions, achieving optimal minimax rates for L1 and Slope regularizations and improved rates for Group L1-L2 regularization, with bounds scaling as (k*/n) log(p/k*) and (s*/n) log(G/s*) + m*/n, respectively.

We study a family of sparse estimators defined as minimizers of some empirical Lipschitz loss function -- which include the hinge loss, the logistic loss and the quantile regression loss -- with a convex, sparse or group-sparse regularization. In particular, we consider the L1 norm on the coefficients, its sorted Slope version, and the Group L1-L2 extension. We propose a new theoretical framework that uses common assumptions in the literature to simultaneously derive new high-dimensional L2 estimation upper bounds for all three regularization schemes. %, and to improve over existing results. For L1 and Slope regularizations, our bounds scale as $(k^*/n) \log(p/k^*)$ -- $n\times p$ is the size of the design matrix and $k^*$ the dimension of the theoretical loss minimizer $\Bβ^*$ -- and match the optimal minimax rate achieved for the least-squares case. For Group L1-L2 regularization, our bounds scale as $(s^*/n) \log\left( G / s^* \right) + m^* / n$ -- $G$ is the total number of groups and $m^*$ the number of coefficients in the $s^*$ groups which contain $\Bβ^*$ -- and improve over the least-squares case. We show that, when the signal is strongly group-sparse, Group L1-L2 is superior to L1 and Slope. In addition, we adapt our approach to the sub-Gaussian linear regression framework and reach the optimal minimax rate for Lasso, and an improved rate for Group-Lasso. Finally, we release an accelerated proximal algorithm that computes the nine main convex estimators of interest when the number of variables is of the order of $100,000s$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes