Post-selection inference with HSIC-Lasso
This addresses the issue of unreliable feature selection for machine learning practitioners, though it is incremental as it builds on existing HSIC-Lasso and truncated Gaussian frameworks.
The paper tackled the problem of flawed feature selection in non-linear or high-dimensional data by proposing a post-selection inference method using HSIC-Lasso, which demonstrated tight control of type-I error in experiments with artificial and real-world data.
Detecting influential features in non-linear and/or high-dimensional data is a challenging and increasingly important task in machine learning. Variable selection methods have thus been gaining much attention as well as post-selection inference. Indeed, the selected features can be significantly flawed when the selection procedure is not accounted for. We propose a selective inference procedure using the so-called model-free "HSIC-Lasso" based on the framework of truncated Gaussians combined with the polyhedral lemma. We then develop an algorithm, which allows for low computational costs and provides a selection of the regularisation parameter. The performance of our method is illustrated by both artificial and real-world data based experiments, which emphasise a tight control of the type-I error, even for small sample sizes.