LGNov 30, 2021

CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data

arXiv:2111.15037v337 citations
Originality Incremental advance
AI Analysis

This addresses the problem of visualizing hyperbolic embeddings for researchers and practitioners in machine learning, offering a domain-specific tool that is incremental but with strong gains.

The paper tackles the problem of visualizing high-dimensional hyperbolic data, which is challenging due to non-trivial optimization and the inhomogeneous nature of hyperbolic space, by proposing CO-SNE, an extension of t-SNE to hyperbolic space that uses hyperbolic normal and Cauchy distributions and preserves distances to the origin. The results show that CO-SNE significantly outperforms existing visualization tools like PCA, t-SNE, UMAP, and HoroPCA in preserving hyperbolic characteristics.

Hyperbolic space can naturally embed hierarchies that often exist in real-world data and semantics. While high-dimensional hyperbolic embeddings lead to better representations, most hyperbolic models utilize low-dimensional embeddings, due to non-trivial optimization and visualization of high-dimensional hyperbolic data. We propose CO-SNE, which extends the Euclidean space visualization tool, t-SNE, to hyperbolic space. Like t-SNE, it converts distances between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of high-dimensional data $X$ and low-dimensional embedding $Y$. However, unlike Euclidean space, hyperbolic space is inhomogeneous: A volume could contain a lot more points at a location far from the origin. CO-SNE thus uses hyperbolic normal distributions for $X$ and hyperbolic \underline{C}auchy instead of t-SNE's Student's t-distribution for $Y$, and it additionally seeks to preserve $X$'s individual distances to the \underline{O}rigin in $Y$. We apply CO-SNE to naturally hyperbolic data and supervisedly learned hyperbolic features. Our results demonstrate that CO-SNE deflates high-dimensional hyperbolic data into a low-dimensional space without losing their hyperbolic characteristics, significantly outperforming popular visualization tools such as PCA, t-SNE, UMAP, and HoroPCA which is also designed for hyperbolic data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes