MLLGMar 19, 2025

Hierarchical clustering with maximum density paths and mixture models

arXiv:2503.15582v2h-index: 13Has Code
Originality Highly original
AI Analysis

This addresses a bottleneck in hierarchical clustering for high-dimensional data analysis, offering a probabilistically grounded tool for exploratory data analysis.

The paper tackles the problem of hierarchical clustering in high-dimensional data with unclear density gaps, introducing t-NEB, which achieves state-of-the-art clustering performance on naturalistic high-dimensional data.

Hierarchical clustering is an effective, interpretable method for analyzing structure in data. It reveals insights at multiple scales without requiring a predefined number of clusters and captures nested patterns and subtle relationships, which are often missed by flat clustering approaches. However, existing hierarchical clustering methods struggle with high-dimensional data, especially when there are no clear density gaps between modes. In this work, we introduce t-NEB, a probabilistically grounded hierarchical clustering method, which yields state-of-the-art clustering performance on naturalistic high-dimensional data. t-NEB consists of three steps: (1) density estimation via overclustering; (2) finding maximum density paths between clusters; (3) creating a hierarchical structure via bottom-up cluster merging. t-NEB uses a probabilistic parametric density model for both overclustering and cluster merging, which yields both high clustering performance and a meaningful hierarchy, making it a valuable tool for exploratory data analysis. Code is available at https://github.com/ecker-lab/tneb clustering.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes