LGMLFeb 13, 2020

Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE

arXiv:2002.05687v19 citationsHas Code
AI Analysis

This work addresses the need for improved exploratory data analysis tools in fields like biology, offering a method for hierarchical visualization and clustering without requiring predefined cluster counts, though it is incremental as it builds on existing t-SNE and clustering techniques.

The paper tackles the problem of hierarchical clustering and visualization by combining t-SNE and hierarchical clustering into tree-SNE, introducing alpha-clustering for optimal cluster assignment without prior knowledge of cluster numbers, and demonstrates effectiveness on datasets like handwritten digits and single-cell RNA-sequencing data, achieving competitive unsupervised clustering results on image datasets.

t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology. Building on recent advances in speeding up t-SNE and obtaining finer-grained structure, we combine the two to create tree-SNE, a hierarchical clustering and visualization algorithm based on stacked one-dimensional t-SNE embeddings. We also introduce alpha-clustering, which recommends the optimal cluster assignment, without foreknowledge of the number of clusters, based off of the cluster stability across multiple scales. We demonstrate the effectiveness of tree-SNE and alpha-clustering on images of handwritten digits, mass cytometry (CyTOF) data from blood cells, and single-cell RNA-sequencing (scRNA-seq) data from retinal cells. Furthermore, to demonstrate the validity of the visualization, we use alpha-clustering to obtain unsupervised clustering results competitive with the state of the art on several image data sets. Software is available at https://github.com/isaacrob/treesne.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes