COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints
This addresses the challenge of ill-posed clustering for users who need to guide clustering systems with minimal constraints, though it is incremental as it builds on existing active constraint-based methods.
The paper tackles the problem of active constraint-based clustering by proposing COBRA, a method that over-clusters data and merges clusters using pairwise constraints, achieving state-of-the-art performance in clustering quality and runtime without needing the number of clusters in advance.
Clustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first over-clusters the data by running K-means with a $K$ that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance.