LGMar 28

Kempe Swap K-Means: A Scalable Near-Optimal Solution for Semi-Supervised Clustering

arXiv:2603.274175.8

AI Analysis

For practitioners needing constrained clustering, this algorithm offers a scalable near-optimal solution, though it is an incremental improvement over existing methods.

This paper introduces Kempe Swap K-Means, a centroid-based algorithm for semi-supervised clustering with must-link and cannot-link constraints, achieving near-optimal partitions with high computational efficiency and scalability, outperforming state-of-the-art benchmarks in accuracy and efficiency on large-scale datasets.

This paper presents a novel centroid-based heuristic algorithm, termed Kempe Swap K-Means, for constrained clustering under rigid must-link (ML) and cannot-link (CL) constraints. The algorithm employs a dual-phase iterative process: an assignment step that utilizes Kempe chain swaps to refine current clustering in the constrained solution space and a centroid update step that computes optimal cluster centroids. To enhance global search capabilities and avoid local optima, the framework incorporates controlled perturbations during the update phase. Empirical evaluations demonstrate that the proposed method achieves near-optimal partitions while maintaining high computational efficiency and scalability. The results indicate that Kempe Swap K-Means consistently outperforms state-of-the-art benchmarks in both clustering accuracy and algorithmic efficiency for large-scale datasets.

View on arXiv PDF

Similar