DS GT LGSep 22, 2024

A High-Performance External Validity Index for Clustering with a Large Number of Clusters

arXiv:2409.14455v11.2h-index: 9

Originality Incremental advance

AI Analysis

It provides a scalable and practical solution for researchers and practitioners working with big data clustering tasks, though it is incremental as it builds on existing matching frameworks.

This paper tackles the problem of evaluating clustering quality in large-scale datasets with many clusters by introducing the Stable Matching Based Pairing (SMBP) algorithm, which reduces computational complexity from O(N^3) to O(N^2) while maintaining comparable accuracy to traditional methods.

This paper introduces the Stable Matching Based Pairing (SMBP) algorithm, a high-performance external validity index for clustering evaluation in large-scale datasets with a large number of clusters. SMBP leverages the stable matching framework to pair clusters across different clustering methods, significantly reducing computational complexity to $O(N^2)$, compared to traditional Maximum Weighted Matching (MWM) with $O(N^3)$ complexity. Through comprehensive evaluations on real-world and synthetic datasets, SMBP demonstrates comparable accuracy to MWM and superior computational efficiency. It is particularly effective for balanced, unbalanced, and large-scale datasets with a large number of clusters, making it a scalable and practical solution for modern clustering tasks. Additionally, SMBP is easily implementable within machine learning frameworks like PyTorch and TensorFlow, offering a robust tool for big data applications. The algorithm is validated through extensive experiments, showcasing its potential as a powerful alternative to existing methods such as Maximum Match Measure (MMM) and Centroid Ratio (CR).

View on arXiv PDF

Similar