ML LG MEJun 11, 2022

Feature Selection using e-values

Subhabrata Majumdar, Snigdhansu Chatterjee

arXiv:2206.05391v22.14 citationsh-index: 19Has Code

Originality Incremental advance

AI Analysis

This addresses the computational bottleneck in feature selection for researchers and practitioners, though it is incremental as it builds on existing parametric models.

The paper tackles feature selection in supervised parametric models by introducing e-values, a scalar measure of proximity between models trained on subsets versus all features, which requires fitting only p+1 models instead of 2^p, showing it as a promising general alternative to existing methods.

In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a $p$-dimensional feature space, this procedure requires fitting only the full model and evaluating $p+1$ models, as opposed to the traditional requirement of fitting and evaluating $2^p$ models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection.

View on arXiv PDF Code

Similar