OCLGMar 6, 2025

A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems

arXiv:2503.04447v1h-index: 7
Originality Incremental advance
AI Analysis

This addresses the challenge of semi-supervised clustering without prior knowledge of cluster numbers for applications in data analysis, though it appears incremental as it builds on existing graph-based methods.

The paper tackles the semi-supervised clustering problem by formulating it as a graph partitioning task with a continuous optimization model that avoids needing the exact cluster number and ensures must-link constraints, achieving effectiveness and efficiency in numerical experiments.

Semi-supervised clustering is a basic problem in various applications. Most existing methods require knowledge of the ideal cluster number, which is often difficult to obtain in practice. Besides, satisfying the must-link constraints is another major challenge for these methods. In this work, we view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset, where the similarity matrix includes a scaling parameter to reflect the must-link constraints. Utilizing a relaxation technique, we formulate the graph partitioning problem into a continuous optimization model that does not require the exact cluster number, but only an overestimate of it. We then propose a block coordinate descent algorithm to efficiently solve this model, and establish its convergence result. Based on the obtained solution, we can construct the clusters that theoretically meet the must-link constraints under mild assumptions. Furthermore, we verify the effectiveness and efficiency of our proposed method through comprehensive numerical experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes