CVMar 23

CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering

arXiv:2603.2674447.14 citationsh-index: 4
AI Analysis

For data mining practitioners, CNMBI provides a robust method for cluster number determination that works on complex, high-dimensional data without distribution assumptions.

CNMBI determines the optimal number of clusters without prior information by mapping the task to a dynamic comparison of cluster centers using bipartite graph theory and actively removing low-confidence samples. It outperforms state-of-the-art methods on challenging datasets like CIFAR-10 and STL-10.

One of the main challenges in data mining is choosing the optimal number of clusters without prior information. Notably, existing methods are usually in the philosophy of cluster validation and hence have underlying assumptions on data distribution, which prevents their application to complex data such as large-scale images and high-dimensional data from the real world. In this regard, we propose an approach named CNMBI. Leveraging the distribution information inherent in the data space, we map the target task as a dynamic comparison process between cluster centers regarding positional behavior, without relying on the complete clustering results and designing the complex validity index as before. Bipartite graph theory is then employed to efficiently model this process. Additionally, we find that different samples have different confidence levels and thereby actively remove low-confidence ones, which is, for the first time to our knowledge, considered in cluster number determination. CNMBI is robust and allows for more flexibility in the dimension and shape of the target data (e.g., CIFAR-10 and STL-10). Extensive comparison studies with state-of-the-art competitors on various challenging datasets demonstrate the superiority of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes