CO MLJan 9, 2017

MEBoost: Variable Selection in the Presence of Measurement Error

Benjamin Brown, Timothy Weaver, Julian Wolfson

arXiv:1701.02349v322 citations

Originality Incremental advance

AI Analysis

This addresses a practical problem in statistics and applied fields like clinical trials where measurement error is common, but it is an incremental improvement over existing methods like CoCoLasso.

The paper tackles variable selection in regression when covariates have measurement error by proposing MEBoost, an iterative algorithm that corrects for this error. Simulation results show MEBoost outperforms CoCoLasso and naive Lasso in reducing prediction error and improving selection accuracy as error increases, with an application to a clinical nutrition trial.

We present a novel method for variable selection in regression models when covariates are measured with error. The iterative algorithm we propose, MEBoost, follows a path defined by estimating equations that correct for covariate measurement error. Via simulation, we evaluated our method and compare its performance to the recently-proposed Convex Conditioned Lasso (CoCoLasso) and to the "naive" Lasso which does not correct for measurement error. Increasing the degree of measurement error increased prediction error and decreased the probability of accurate covariate selection, but this loss of accuracy was least pronounced when using MEBoost. We illustrate the use of MEBoost in practice by analyzing data from the Box Lunch Study, a clinical trial in nutrition where several variables are based on self-report and hence measured with error.

View on arXiv PDF

Similar