Feature Selection For High-Dimensional Clustering
This work addresses feature selection challenges in high-dimensional data analysis for clustering applications, representing an incremental advancement with specific theoretical contributions.
The paper tackles the problem of selecting informative features for high-dimensional clustering by introducing a nonparametric method that combines multimodality screening, kernel density estimation, and mode clustering, resulting in explicit error bounds for the clustering and the first error bounds for mode-based clustering.
We present a nonparametric method for selecting informative features in high-dimensional clustering problems. We start with a screening step that uses a test for multimodality. Then we apply kernel density estimation and mode clustering to the selected features. The output of the method consists of a list of relevant features, and cluster assignments. We provide explicit bounds on the error rate of the resulting clustering. In addition, we provide the first error bounds on mode based clustering.