Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization
This addresses reproducibility issues in feature selection for researchers in statistics and machine learning, though it is an incremental improvement on an existing method.
The paper tackles the instability of the Model-X knockoff procedure for feature selection by introducing simultaneous multi-knockoffs, which guarantees false discovery rate control and shows substantially improved stability and power in experiments.
The Model-X knockoff procedure has recently emerged as a powerful approach for feature selection with statistical guarantees. The advantage of knockoff is that if we have a good model of the features X, then we can identify salient features without knowing anything about how the outcome Y depends on X. An important drawback of knockoffs is its instability: running the procedure twice can result in very different selected features, potentially leading to different conclusions. Addressing this instability is critical for obtaining reproducible and robust results. Here we present a generalization of the knockoff procedure that we call simultaneous multi-knockoffs. We show that multi-knockoff guarantees false discovery rate (FDR) control, and is substantially more stable and powerful compared to the standard (single) knockoff. Moreover we propose a new algorithm based on entropy maximization for generating Gaussian multi-knockoffs. We validate the improved stability and power of multi-knockoffs in systematic experiments. We also illustrate how multi-knockoffs can improve the accuracy of detecting genetic mutations that are causally linked to phenotypes.