MLLGMar 10, 2023

Clustering with minimum spanning trees: How good can it be?

arXiv:2303.05679v420 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses the problem of evaluating and improving clustering algorithms for pattern recognition, though it is incremental in extending existing MST-based methods.

The paper quantifies the performance of minimum spanning tree (MST) methods in low-dimensional clustering tasks, finding they are competitive with oracle algorithms and often outperform non-MST methods like K-means and spectral clustering.

Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes