LGMEMLJan 9, 2025

Cluster Catch Digraphs with the Nearest Neighbor Distance

arXiv:2501.06268v2h-index: 17
Originality Incremental advance
AI Analysis

This is an incremental improvement for clustering high-dimensional datasets, addressing specific limitations in graph-based methods.

The authors tackled the problem of clustering high-dimensional data by introducing a new variant of spatial randomness test using nearest neighbor distance in Cluster Catch Digraphs, which outperformed or matched existing methods in Monte Carlo and real-world evaluations.

We introduce a new method for clustering based on Cluster Catch Digraphs (CCDs). The new method addresses the limitations of RK-CCDs by employing a new variant of spatial randomness test that employs the nearest neighbor distance (NND) instead of the Ripley's K function used by RK-CCDs. We conduct a comprehensive Monte Carlo analysis to assess the performance of our method, considering factors such as dimensionality, data set size, number of clusters, cluster volumes, and inter-cluster distance. Our method is particularly effective for high-dimensional data sets, comparable to or outperforming KS-CCDs and RK-CCDs that rely on a KS-type statistic or the Ripley's K function. We also evaluate our methods using real and complex data sets, comparing them to well-known clustering methods. Again, our methods exhibit competitive performance, producing high-quality clusters with desirable properties. Keywords: Graph-based clustering, Cluster catch digraphs, High-dimensional data, The nearest neighbor distance, Spatial randomness test

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes