Conditional t-SNE: Complementary t-SNE embeddings through factoring out prior information
This is an incremental improvement for researchers and practitioners using dimensionality reduction for data visualization, as it extends t-SNE to handle known structure without requiring multiple embeddings.
The paper tackles the problem of visualizing high-dimensional data when prior information (like labels) is known, which can obscure complementary structure in standard t-SNE embeddings, by introducing conditional t-SNE (ct-SNE) to factor out this prior knowledge, resulting in effective capture of new insights as shown in empirical results on synthetic and real data.
Dimensionality reduction and manifold learning methods such as t-Distributed Stochastic Neighbor Embedding (t-SNE) are routinely used to map high-dimensional data into a 2-dimensional space to visualize and explore the data. However, two dimensions are typically insufficient to capture all structure in the data, the salient structure is often already known, and it is not obvious how to extract the remaining information in a similarly effective manner. To fill this gap, we introduce \emph{conditional t-SNE} (ct-SNE), a generalization of t-SNE that discounts prior information from the embedding in the form of labels. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining a single, integrated, and elegant method. ct-SNE has one extra parameter over t-SNE; we investigate its effects and show how to efficiently optimize the objective. Factoring out prior knowledge allows complementary structure to be captured in the embedding, providing new insights. Qualitative and quantitative empirical results on synthetic and (large) real data show ct-SNE is effective and achieves its goal.