LGOct 9, 2025

t-SNE Exaggerates Clusters, Provably

arXiv:2510.07746v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses a critical issue for users of t-SNE in data visualization and analysis, revealing fundamental limitations in a widely used tool.

The paper proves that t-SNE visualizations can misrepresent the strength of input clusters and the extremity of outliers, making them unreliable for inferring these properties, and demonstrates these failures in practice.

Central to the widespread use of t-distributed stochastic neighbor embedding (t-SNE) is the conviction that it produces visualizations whose structure roughly matches that of the input. To the contrary, we prove that (1) the strength of the input clustering, and (2) the extremity of outlier points, cannot be reliably inferred from the t-SNE output. We demonstrate the prevalence of these failure modes in practice as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes