Automatic Selection of t-SNE Perplexity
This work addresses a practical bottleneck for users of t-SNE in data visualization, offering an incremental improvement by automating hyperparameter tuning.
The paper tackles the problem of manually selecting the perplexity hyperparameter in t-SNE for data visualization by proposing an automatic selection method, which was empirically validated to align with human expert preferences across multiple datasets.
t-Distributed Stochastic Neighbor Embedding (t-SNE) is one of the most widely used dimensionality reduction methods for data visualization, but it has a perplexity hyperparameter that requires manual selection. In practice, proper tuning of t-SNE perplexity requires users to understand the inner working of the method as well as to have hands-on experience. We propose a model selection objective for t-SNE perplexity that requires negligible extra computation beyond that of the t-SNE itself. We empirically validate that the perplexity settings found by our approach are consistent with preferences elicited from human experts across a number of datasets. The similarities of our approach to Bayesian information criteria (BIC) and minimum description length (MDL) are also analyzed.