MEMLAug 19, 2019

Model-free Feature Screening and FDR Control with Knockoff Features

arXiv:1908.06597v386 citations
AI Analysis

This addresses feature selection for high-dimensional data analysis, offering a robust and adaptive method, but it is incremental as it builds on existing screening and knockoff techniques.

The paper tackles feature screening in ultra-high dimensional datasets by proposing a model-free method based on projection correlation, which works with heavy-tailed errors and multivariate responses, achieving sure screening and rank consistency. It also introduces a two-step approach with knockoff features to control false discovery rate (FDR), validated by numerical experiments and real data applications.

This paper proposes a model-free and data-adaptive feature screening method for ultra-high dimensional datasets. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This projection correlation based method does not require specifying a regression model and applies to the data in the presence of heavy-tailed errors and multivariate response. It enjoys both sure screening and rank consistency properties under weak assumptions. Further, a two-step approach is proposed to control the false discovery rate (FDR) in feature screening with the help of knockoff features. It can be shown that the proposed two-step approach enjoys both sure screening and FDR control if the pre-specified FDR level $α$ is greater or equal to $1/s$, where $s$ is the number of active features. The superior empirical performance of the proposed methods is justified by various numerical experiments and real data applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes