Barnes-Hut-SNE
This provides a scalable solution for researchers and practitioners needing to visualize large high-dimensional datasets, though it is incremental as it builds on existing t-SNE methods.
The paper tackled the computational inefficiency of t-SNE, which normally runs in O(N^2), by developing an O(N log N) implementation called Barnes-Hut-SNE, enabling embeddings of datasets with millions of objects.
The paper presents an O(N log N)-implementation of t-SNE -- an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O(N^2). The new implementation uses vantage-point trees to compute sparse pairwise similarities between the input data objects, and it uses a variant of the Barnes-Hut algorithm - an algorithm used by astronomers to perform N-body simulations - to approximate the forces between the corresponding points in the embedding. Our experiments show that the new algorithm, called Barnes-Hut-SNE, leads to substantial computational advantages over standard t-SNE, and that it makes it possible to learn embeddings of data sets with millions of objects.