Clustering with minimum spanning trees: How good can it be?
This work addresses the problem of evaluating and improving clustering algorithms for pattern recognition, though it is incremental in extending existing MST-based methods.
The paper quantifies the performance of minimum spanning tree (MST) methods in low-dimensional clustering tasks, finding they are competitive with oracle algorithms and often outperform non-MST methods like K-means and spectral clustering.
Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.