LG AIMay 21

Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection

Muhammad Rajabinasab, Michael E. Houle, Oussama Chelly, Arthur Zimek

arXiv:2605.2297319.2

AI Analysis

For researchers developing unsupervised feature selection methods, this work highlights the need for a simple baseline to avoid overclaiming improvements.

The paper shows that many state-of-the-art unsupervised feature selection methods are outperformed by random feature selection in both performance and efficiency, and argues for using random selection as a mandatory baseline.

Many novel unsupervised feature selection methods are proposed each year, yet their empirical evaluation is limited to supervised and unsupervised evaluation metrics computed on selected datasets, along with comparisons to existing methods. However, in the absence of an established evaluation baseline, it is difficult to determine the value added to the existing literature by each of these methods, and how effective their underlying approaches are. We propose using random feature selection as a baseline for evaluating the unsupervised feature selection methods. We empirically show that many of the state-of-the-art methods in unsupervised feature selection are outperformed by random feature selection in both performance and efficiency. Accordingly, we emphasize on the strict requirement of considering random feature selection as a baseline in the development process of novel unsupervised feature selection methods to ensure a consistent improvement over random feature selection.

View on arXiv PDF

Similar