LGJan 21, 2024

Enabling clustering algorithms to detect clusters of varying densities through scale-invariant data preprocessing

Sunil Aryal, Jonathan R. Wells, Arbind Agrahari Baniya, KC Santosh

arXiv:2401.11402v12.6

Originality Incremental advance

AI Analysis

This addresses a common limitation in clustering for data analysis, though it appears incremental as it builds on existing rank transformations.

The paper tackles the problem of clustering algorithms failing to detect clusters of varying densities by introducing ARES, a scale-invariant data preprocessing method, which improves performance across KMeans, DBSCAN, and DP algorithms on real-world datasets.

In this paper, we show that preprocessing data using a variant of rank transformation called 'Average Rank over an Ensemble of Sub-samples (ARES)' makes clustering algorithms robust to data representation and enable them to detect varying density clusters. Our empirical results, obtained using three most widely used clustering algorithms-namely KMeans, DBSCAN, and DP (Density Peak)-across a wide range of real-world datasets, show that clustering after ARES transformation produces better and more consistent results.

View on arXiv PDF

Similar