HCLGApr 9, 2025

A Scalable Approach to Clustering Embedding Projections

AppleCMU
arXiv:2504.07285v23 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in interactive data visualization for users analyzing embeddings, though it is incremental as it optimizes an existing task rather than introducing a new paradigm.

The paper tackles the computational expense of clustering points for labeling in interactive embedding visualizations by proposing an efficient clustering method using kernel density estimation on 2D projections, achieving high-quality cluster generation in a few hundred milliseconds, which is orders of magnitude faster than existing approaches.

Interactive visualization of embedding projections is a useful technique for understanding data and evaluating machine learning models. Labeling data within these visualizations is critical for interpretation, as labels provide an overview of the projection and guide user navigation. However, most methods for producing labels require clustering the points, which can be computationally expensive as the number of points grows. In this paper, we describe an efficient clustering approach using kernel density estimation in the projected 2D space instead of points. This algorithm can produce high-quality cluster regions from a 2D density map in a few hundred milliseconds, orders of magnitude faster than current approaches. We contribute the design of the algorithm, benchmarks, and applications that demonstrate the utility of the algorithm, including labeling and summarization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes