MLLGSep 13, 2024

Model-independent variable selection via the rule-based variable priority

arXiv:2409.09003v33 citationsh-index: 57
Originality Incremental advance
AI Analysis

This addresses the need for a model-independent variable selection method that is easy to use across various data settings, though it appears incremental as it builds on existing selection techniques.

The paper tackles the problem of selecting a small number of explanatory features in machine learning by introducing a model-independent method called Variable Priority (VarPro), which avoids generating artificial data and shows consistent filtering of noise variables with balanced performance compared to state-of-the-art methods.

While achieving high prediction accuracy is a fundamental goal in machine learning, an equally important task is finding a small number of features with high explanatory power. One popular selection technique is permutation importance, which assesses a variable's impact by measuring the change in prediction error after permuting the variable. However, this can be problematic due to the need to create artificial data, a problem shared by other methods as well. Another problem is that variable selection methods can be limited by being model-specific. We introduce a new model-independent approach, Variable Priority (VarPro), which works by utilizing rules without the need to generate artificial data or evaluate prediction error. The method is relatively easy to use, requiring only the calculation of sample averages of simple statistics, and can be applied to many data settings, including regression, classification, and survival. We investigate the asymptotic properties of VarPro and show, among other things, that VarPro has a consistent filtering property for noise variables. Empirical studies using synthetic and real-world data show the method achieves a balanced performance and compares favorably to many state-of-the-art procedures currently used for variable selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes