Visualizing and Exploring Dynamic High-Dimensional Datasets with LION-tSNE
This addresses a limitation for users needing to update visualizations in dynamic datasets, such as dashboards, but is incremental as it builds on tSNE.
The paper tackles the problem of tSNE's inability to incorporate new data into existing visualizations, proposing LION-tSNE, a method based on local interpolation and outlier control that shows robustness to outliers and new samples from existing clusters.
T-distributed stochastic neighbor embedding (tSNE) is a popular and prize-winning approach for dimensionality reduction and visualizing high-dimensional data. However, tSNE is non-parametric: once visualization is built, tSNE is not designed to incorporate additional data into existing representation. It highly limits the applicability of tSNE to the scenarios where data are added or updated over time (like dashboards or series of data snapshots). In this paper we propose, analyze and evaluate LION-tSNE (Local Interpolation with Outlier coNtrol) - a novel approach for incorporating new data into tSNE representation. LION-tSNE is based on local interpolation in the vicinity of training data, outlier detection and a special outlier mapping algorithm. We show that LION-tSNE method is robust both to outliers and to new samples from existing clusters. We also discuss multiple possible improvements for special cases. We compare LION-tSNE to a comprehensive list of possible benchmark approaches that include multiple interpolation techniques, gradient descent for new data, and neural network approximation.