LG STAT-MECH MLJul 31, 2025

Optimised Feature Subset Selection via Simulated Annealing

Fernando Martínez-García, Álvaro Rubio-García, Samuel Fernández-Lorenzo, Juan José García-Ripoll, Diego Porras

arXiv:2507.23568v11 citationsh-index: 4Machine Learning: Science and Technology

Originality Incremental advance

AI Analysis

This provides an incremental improvement for designing interpretable models in high-dimensional settings where sparsity and performance are critical.

The paper tackled feature selection in high-dimensional classification by proposing SA-FDR, a simulated annealing-based algorithm that selects compact feature subsets using the Fisher discriminant ratio, achieving high predictive accuracy on datasets with up to hundreds of thousands of samples and hundreds of features.

We introduce SA-FDR, a novel algorithm for $\ell_0$-norm feature selection that considers this task as a combinatorial optimisation problem and solves it by using simulated annealing to perform a global search over the space of feature subsets. The optimisation is guided by the Fisher discriminant ratio, which we use as a computationally efficient proxy for model quality in classification tasks. Our experiments, conducted on datasets with up to hundreds of thousands of samples and hundreds of features, demonstrate that SA-FDR consistently selects more compact feature subsets while achieving a high predictive accuracy. This ability to recover informative yet minimal sets of features stems from its capacity to capture inter-feature dependencies often missed by greedy optimisation approaches. As a result, SA-FDR provides a flexible and effective solution for designing interpretable models in high-dimensional settings, particularly when model sparsity, interpretability, and performance are crucial.

View on arXiv PDF

Similar