ML LGMar 29, 2019

The False Positive Control Lasso

Erik Drysdale, Yingwei Peng, Timothy P. Hanna, Paul Nguyen, Anna Goldenberg

arXiv:1903.12584v13.22 citationsHas Code

Originality Incremental advance

AI Analysis

This provides a finite-sample false positive control method for sparse regression, addressing limitations of existing approaches that rely on asymptotics or specific assumptions, but it is incremental as it builds on the SQRT-Lasso.

The paper tackles the problem of controlling false positives in Lasso regression for high-dimensional data by recasting the SQRT-Lasso as a method for this purpose and extending it to generalized linear models, achieving negligible approximation error under a mutual incoherence condition.

In high dimensional settings where a small number of regressors are expected to be important, the Lasso estimator can be used to obtain a sparse solution vector with the expectation that most of the non-zero coefficients are associated with true signals. While several approaches have been developed to control the inclusion of false predictors with the Lasso, these approaches are limited by relying on asymptotic theory, having to empirically estimate terms based on theoretical quantities, assuming a continuous response class with Gaussian noise and design matrices, or high computation costs. In this paper we show how: (1) an existing model (the SQRT-Lasso) can be recast as a method of controlling the number of expected false positives, (2) how a similar estimator can used for all other generalized linear model classes, and (3) this approach can be fit with existing fast Lasso optimization solvers. Our justification for false positive control using randomly weighted self-normalized sum theory is to our knowledge novel. Moreover, our estimator's properties hold in finite samples up to some approximation error which we find in practical settings to be negligible under a strict mutual incoherence condition.

View on arXiv PDF Code

Similar