MELGNov 30, 2021

Hierarchical clustering: visualization, feature importance and model selection

arXiv:2112.01372v255 citations
AI Analysis

This work addresses the need for better visualization, feature importance, and model selection in hierarchical clustering for data analysts, though it is incremental as it builds on existing dendrogram concepts.

The paper tackles the problem of analyzing hierarchical clustering by proposing methods that utilize the full dendrogram structure, rather than requiring a single partition, resulting in a framework that provides more insights than state-of-the-art approaches.

We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require the user to generate a single partition of the instances by cutting the dendrogram at a specified level. Our proposed methods, instead, use the full structure of the dendrogram. The key insight behind the proposed methods is to view a dendrogram as a phylogeny. This analogy permits the assignment of a feature value to each internal node of a tree through an evolutionary model. Real and simulated datasets provide evidence that our proposed framework has desirable outcomes and gives more insights than state-of-art approaches. We provide an R package that implements our methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes