Stochastic Neighbor Embedding under f-divergences
This work provides an incremental improvement for data visualization researchers by generalizing t-SNE to f-divergences and offering a better optimization method.
The authors tackled the problem of extending t-SNE to use f-divergences instead of just KL divergence, finding that different divergences perform better for capturing different types of latent structures like manifolds, clusters, and hierarchies. They also proposed optimizing these divergences via a variational bound, which improved embedding results compared to the original t-SNE.
The t-distributed Stochastic Neighbor Embedding (t-SNE) is a powerful and popular method for visualizing high-dimensional data. It minimizes the Kullback-Leibler (KL) divergence between the original and embedded data distributions. In this work, we propose extending this method to other f-divergences. We analytically and empirically evaluate the types of latent structure-manifold, cluster, and hierarchical-that are well-captured using both the original KL-divergence as well as the proposed f-divergence generalization, and find that different divergences perform better for different types of structure. A common concern with $t$-SNE criterion is that it is optimized using gradient descent, and can become stuck in poor local minima. We propose optimizing the f-divergence based loss criteria by minimizing a variational bound. This typically performs better than optimizing the primal form, and our experiments show that it can improve upon the embedding results obtained from the original $t$-SNE criterion as well.