A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems
This addresses the challenge of semi-supervised clustering without prior knowledge of cluster numbers for applications in data analysis, though it appears incremental as it builds on existing graph-based methods.
The paper tackles the semi-supervised clustering problem by formulating it as a graph partitioning task with a continuous optimization model that avoids needing the exact cluster number and ensures must-link constraints, achieving effectiveness and efficiency in numerical experiments.
Semi-supervised clustering is a basic problem in various applications. Most existing methods require knowledge of the ideal cluster number, which is often difficult to obtain in practice. Besides, satisfying the must-link constraints is another major challenge for these methods. In this work, we view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset, where the similarity matrix includes a scaling parameter to reflect the must-link constraints. Utilizing a relaxation technique, we formulate the graph partitioning problem into a continuous optimization model that does not require the exact cluster number, but only an overestimate of it. We then propose a block coordinate descent algorithm to efficiently solve this model, and establish its convergence result. Based on the obtained solution, we can construct the clusters that theoretically meet the must-link constraints under mild assumptions. Furthermore, we verify the effectiveness and efficiency of our proposed method through comprehensive numerical experiments.