LGApr 23

Assessing the impact of dimensionality reduction on clustering performance -- a systematic study

Ousmane Assani Amate, Mohammadreza Bakhtyari, Émilie Roy, Vladimir Makarenkov

arXiv:2604.2209917.8h-index: 1

AI Analysis

Provides practical guidance for practitioners selecting dimensionality reduction methods for clustering, but the findings are incremental and confirm known dependencies.

This study systematically evaluates the impact of five dimensionality reduction techniques on four clustering algorithms across various reduction levels, finding that performance depends heavily on the choice of technique and reduction level relative to data geometry and clustering method.

Dimensionality reduction is a critical preprocessing step for clustering high-dimensional data, yet comprehensive evaluation of its impact across diverse methods and data types remains limited. In this study, we systematically assess the influence of five dimensionality reduction techniques - Principal Component Analysis (PCA), Kernel Principal Component Analysis (Kernel PCA), Variational Autoencoder (VAE), Isometric Mapping (Isomap), and Multidimensional Scaling (MDS) - on the performance of four popular clustering algorithms - k-means, Agglomerative Hierarchical Clustering (AHC), Gaussian Mixture Models (GMM), and Ordering Points to Identify the Clustering Structure (OPTICS). We evaluate clustering quality using the Adjusted Rand Index (ARI), comparing results without and with dimensionality reduction at different reduction levels recommended in the literature (i.e., k-1, where k is the number of clusters, and 25% and 50% of the original number of dimensions). Our findings underscore the importance of a careful selection of the dimensionality reduction technique and the dimensionality reduction level that should be tailored to intrinsic data geometry and clustering algorithms under consideration.

View on arXiv PDF

Similar