LGAug 18, 2021

Stochastic Cluster Embedding

arXiv:2108.08003v37 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of revealing hidden clusters in data visualization for researchers and practitioners using neighbor embedding techniques, representing an incremental advancement.

The paper tackles the problem of visualizing large-scale patterns like clusters in neighbor embedding methods by proposing a new family of methods that generalizes Stochastic Neighbor Embedding with a scale parameter, resulting in consistent and substantial improvements in cluster visualization compared to state-of-the-art approaches.

Neighbor Embedding (NE) aims to preserve pairwise similarities between data items and has been shown to yield an effective principle for data visualization. However, even the best existing NE methods such as Stochastic Neighbor Embedding (SNE) may leave large-scale patterns hidden, for example clusters, despite strong signals being present in the data. To address this, we propose a new cluster visualization method based on the Neighbor Embedding principle. We first present a family of Neighbor Embedding methods that generalizes SNE by using non-normalized Kullback-Leibler divergence with a scale parameter. In this family, much better cluster visualizations often appear with a parameter value different from the one corresponding to SNE. We also develop an efficient software that employs asynchronous stochastic block coordinate descent to optimize the new family of objective functions. Our experimental results demonstrate that the method consistently and substantially improves the visualization of data clusters compared with the state-of-the-art NE approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes