DSAILGMLMay 24, 2018

Hierarchical Clustering with Structural Constraints

arXiv:1805.09476v250 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of incorporating prior structural constraints into hierarchical clustering, which is incremental as it builds on existing optimization viewpoints and improves upon current techniques for specific applications like taxonomy.

The paper tackles the problem of hierarchical clustering with structural constraints by providing provable approximation guarantees for two simple top-down algorithms, using a constraint-based regularization of the objective to handle conflicting prior information, and demonstrates the approach on a real dataset for taxonomy applications.

Hierarchical clustering is a popular unsupervised data analysis method. For many real-world applications, we would like to exploit prior information about the data that imposes constraints on the clustering hierarchy, and is not captured by the set of features available to the algorithm. This gives rise to the problem of "hierarchical clustering with structural constraints". Structural constraints pose major challenges for bottom-up approaches like average/single linkage and even though they can be naturally incorporated into top-down divisive algorithms, no formal guarantees exist on the quality of their output. In this paper, we provide provable approximation guarantees for two simple top-down algorithms, using a recently introduced optimization viewpoint of hierarchical clustering with pairwise similarity information [Dasgupta, 2016]. We show how to find good solutions even in the presence of conflicting prior information, by formulating a constraint-based regularization of the objective. We further explore a variation of this objective for dissimilarity information [Cohen-Addad et al., 2018] and improve upon current techniques. Finally, we demonstrate our approach on a real dataset for the taxonomy application.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes