LGMLJun 22, 2019

Flattening a Hierarchical Clustering through Active Learning

arXiv:1906.09458v25 citations
AI Analysis

This work addresses the challenge of efficiently flattening hierarchical clusterings for applications in data analysis, though it appears incremental as it builds on known sampling procedures.

The paper tackles the problem of reconstructing hierarchical clusterings through active pairwise similarity queries, providing theoretical guarantees on query complexity and regret bounds, and demonstrates practical performance with linear-time implementations on real-world datasets.

We investigate active learning by pairwise similarity over the leaves of trees originating from hierarchical clustering procedures. In the realizable setting, we provide a full characterization of the number of queries needed to achieve perfect reconstruction of the tree cut. In the non-realizable setting, we rely on known important-sampling procedures to obtain regret and query complexity bounds. Our algorithms come with theoretical guarantees on the statistical error and, more importantly, lend themselves to linear-time implementations in the relevant parameters of the problem. We discuss such implementations, prove running time guarantees for them, and present preliminary experiments on real-world datasets showing the compelling practical performance of our algorithms as compared to both passive learning and simple active learning baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes