LGMEMLAug 27, 2025

Cluster and then Embed: A Modular Approach for Visualization

arXiv:2509.03373v11 citationsh-index: 31
Originality Incremental advance
AI Analysis

This addresses visualization challenges for researchers and practitioners using clustered data, though it is incremental as it builds on existing methods.

The paper tackles the problem of global geometry distortion in dimensionality reduction methods like t-SNE and UMAP by proposing a modular approach that clusters data first, embeds each cluster, and aligns them, showing competitive results on synthetic and real-world datasets.

Dimensionality reduction methods such as t-SNE and UMAP are popular methods for visualizing data with a potential (latent) clustered structure. They are known to group data points at the same time as they embed them, resulting in visualizations with well-separated clusters that preserve local information well. However, t-SNE and UMAP also tend to distort the global geometry of the underlying data. We propose a more transparent, modular approach consisting of first clustering the data, then embedding each cluster, and finally aligning the clusters to obtain a global embedding. We demonstrate this approach on several synthetic and real-world datasets and show that it is competitive with existing methods, while being much more transparent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes