LGApr 15

Scalable unsupervised feature selection via weight stability

arXiv:2506.0611449.81 citationsh-index: 15Has Code

Predicted impact top 68% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners dealing with high-dimensional clustering, this work offers a scalable and theoretically grounded unsupervised feature selection method, though it is incremental as it builds on existing weighted k-means frameworks.

The paper proposes two unsupervised feature selection algorithms, FS-MWK++ and SFS-MWK++, based on a novel initialization strategy for Minkowski Weighted k-means. The methods identify stable features by aggregating weights across Minkowski exponents, with theoretical guarantees that relevant features receive higher weights than noise features under certain assumptions.

Unsupervised feature selection is critical for improving clustering performance in high-dimensional data, where irrelevant features can obscure meaningful structure. In this work, we introduce the Minkowski weighted k-means++, a novel initialisation strategy for the Minkowski Weighted k-means. Our initialisation selects centroids probabilistically using feature relevance estimates derived from the data itself. Building on this, we propose two new feature selection algorithms, FS-MWK++, which aggregates feature weights across a range of Minkowski exponents to identify stable and informative features, and SFS-MWK++, a scalable variant based on subsampling. We support our approach with a theoretical analysis, demonstrating that, under explicit assumptions on noise features and cluster structure, relevant features are assigned consistently higher weights than noise features across a range of Minkowski exponents. Our software can be found at https://github.com/xzhang4-ops1/FSMWK.

View on arXiv PDF Code

Similar