LGMay 20

Objective-Induced Bias and Search Dynamics in Multiobjective Unsupervised Feature Selection

arXiv:2605.215611.6
Predicted impact top 82% in LG · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work provides guidance for practitioners on objective design in multiobjective unsupervised feature selection, highlighting pitfalls of silhouette-based objectives and the effectiveness of PCA loss.

The paper investigates how different objective formulations and subset-size regularisation directions affect search dynamics and solution quality in multiobjective unsupervised feature selection. Using synthetic data, it finds that silhouette-based objectives bias toward trivial low-cardinality solutions, while PCA reconstruction loss yields compact subsets with test accuracy comparable to supervised accuracy.

Unsupervised feature selection is commonly formulated as a multiobjective optimisation problem that jointly optimises subset quality and subset size. Yet the behaviour of this formulation depends critically on the choice of evaluation objective, the direction of subset-size regularisation, and the initialisation strategy. We study these factors in a controlled setting using a synthetic dataset with known informative, redundant, and irrelevant feature types. Six formulations are compared by combining three evaluation objectives: accuracy, silhouette score, and PCA reconstruction loss with subset-size minimisation or maximisation. The results show that formulation strongly affects both search dynamics and the quality of the resulting Pareto front. Silhouette-based formulations exhibit a strong bias toward trivial low-cardinality solutions and remain weak proxies for predictive performance. In contrast, the proposed PCA loss objective produces compact subsets with test accuracy comparable to subsets obtained by directly optimising supervised accuracy. Overall, the study shows that objective design is central to effective multiobjective unsupervised feature selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes