Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding
This addresses variable selection for statisticians by offering a computationally feasible alternative to hard-thresholding methods, though it is incremental as it builds on existing convex optimization approaches.
The paper tackles the problem of variable selection in linear models by proposing the Least Absolute Gradient Selector (LAGS), a convex method that mimics the discrete selection of l0 regularization, achieving consistency and bounded prediction error up to a logarithmic factor in simulations.
Variable selection in linear models plays a pivotal role in modern statistics. Hard-thresholding methods such as $l_0$ regularization are theoretically ideal but computationally infeasible. In this paper, we propose a new approach, called the LAGS, short for "least absulute gradient selector", to this challenging yet interesting problem by mimicking the discrete selection process of $l_0$ regularization. To estimate $β$ under the influence of noise, we consider, nevertheless, the following convex program [\hatβ = \textrm{arg min}\frac{1}{n}\|X^{T}(y - Xβ)\|_1 + λ_n\sum_{i = 1}^pw_i(y;X;n)|β_i|] $λ_n > 0$ controls the sparsity and $w_i > 0$ dependent on $y, X$ and $n$ is the weights on different $β_i$; $n$ is the sample size. Surprisingly, we shall show in the paper, both geometrically and analytically, that LAGS enjoys two attractive properties: (1) LAGS demonstrates discrete selection behavior and hard thresholding property as $l_0$ regularization by strategically chosen $w_i$, we call this property "pseudo-hard thresholding"; (2) Asymptotically, LAGS is consistent and capable of discovering the true model; nonasymptotically, LAGS is capable of identifying the sparsity in the model and the prediction error of the coefficients is bounded at the noise level up to a logarithmic factor---$\log p$, where $p$ is the number of predictors. Computationally, LAGS can be solved efficiently by convex program routines for its convexity or by simplex algorithm after recasting it into a linear program. The numeric simulation shows that LAGS is superior compared to soft-thresholding methods in terms of mean squared error and parsimony of the model.