Monte Carlo Simulation for Lasso-Type Problems by Estimator Augmentation
This provides a method for statistical inference in Lasso-type models, addressing a bottleneck for researchers and practitioners in sparse regression, though it is incremental as it builds on existing Monte Carlo techniques.
The paper tackles the problem of determining the sampling distribution of Lasso estimators, which is difficult due to their optimization-based definition and sparsity. It finds that the joint distribution of the estimator and its subgradient is tractable and has a closed-form density, enabling Monte Carlo simulation for inference, with demonstrated advantages in flexibility and validity even in high-dimensional settings.
Regularized linear regression under the $\ell_1$ penalty, such as the Lasso, has been shown to be effective in variable selection and sparse modeling. The sampling distribution of an $\ell_1$-penalized estimator $\hatβ$ is hard to determine as the estimator is defined by an optimization problem that in general can only be solved numerically and many of its components may be exactly zero. Let $S$ be the subgradient of the $\ell_1$ norm of the coefficient vector $β$ evaluated at $\hatβ$. We find that the joint sampling distribution of $\hatβ$ and $S$, together called an augmented estimator, is much more tractable and has a closed-form density under a normal error distribution in both low-dimensional ($p\leq n$) and high-dimensional ($p>n$) settings. Given $β$ and the error variance $σ^2$, one may employ standard Monte Carlo methods, such as Markov chain Monte Carlo and importance sampling, to draw samples from the distribution of the augmented estimator and calculate expectations with respect to the sampling distribution of $\hatβ$. We develop a few concrete Monte Carlo algorithms and demonstrate with numerical examples that our approach may offer huge advantages and great flexibility in studying sampling distributions in $\ell_1$-penalized linear regression. We also establish nonasymptotic bounds on the difference between the true sampling distribution of $\hatβ$ and its estimator obtained by plugging in estimated parameters, which justifies the validity of Monte Carlo simulation from an estimated sampling distribution even when $p\gg n\to \infty$.