LGCOMar 31, 2025

Tree-Guided $L_1$-Convex Clustering

arXiv:2503.24012v1
Originality Incremental advance
AI Analysis

This addresses a scalability bottleneck for researchers and practitioners using convex clustering on large datasets, though it is incremental as it builds on existing convex clustering methods.

The paper tackles the computational challenge of obtaining a complete dendrogram for large-scale datasets in convex clustering by developing the Tree-Guided L1-Convex Clustering (TGCC) algorithm, which constructs a clusterpath for 10^6 points in 15 seconds on a standard laptop without sacrificing performance.

Convex clustering is a modern clustering framework that guarantees globally optimal solutions and performs comparably to other advanced clustering methods. However, obtaining a complete dendrogram (clusterpath) for large-scale datasets remains computationally challenging due to the extensive costs associated with iterative optimization approaches. To address this limitation, we develop a novel convex clustering algorithm called Tree-Guided $L_1$-Convex Clustering (TGCC). We first focus on the fact that the loss function of $L_1$-convex clustering with tree-structured weights can be efficiently optimized using a dynamic programming approach. We then develop an efficient cluster fusion algorithm that utilizes the tree structure of the weights to accelerate the optimization process and eliminate the issue of cluster splits commonly observed in convex clustering. By combining the dynamic programming approach with the cluster fusion algorithm, the TGCC algorithm achieves superior computational efficiency without sacrificing clustering performance. Remarkably, our TGCC algorithm can construct a complete clusterpath for $10^6$ points in $\mathbb{R}^2$ within 15 seconds on a standard laptop without the need for parallel or distributed computing frameworks. Moreover, we extend the TGCC algorithm to develop biclustering and sparse convex clustering algorithms.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes