HCLGAug 26, 2023

Class-constrained t-SNE: Combining Data Features and Class Probabilities

arXiv:2308.13837v126 citationsh-index: 38
Originality Incremental advance
AI Analysis

This addresses the need for integrated analysis of model outputs and data characteristics for researchers and practitioners in machine learning, though it is incremental as it builds on existing t-SNE methods.

The paper tackles the problem of separately analyzing data features and class probabilities in dimensionality reduction by proposing class-constrained t-SNE, which combines both perspectives into a single result using a balanced cost function and an interactive parameter, enabling applications in model evaluation and visual-interactive labeling.

Data features and class probabilities are two main perspectives when, e.g., evaluating model results and identifying problematic items. Class probabilities represent the likelihood that each instance belongs to a particular class, which can be produced by probabilistic classifiers or even human labeling with uncertainty. Since both perspectives are multi-dimensional data, dimensionality reduction (DR) techniques are commonly used to extract informative characteristics from them. However, existing methods either focus solely on the data feature perspective or rely on class probability estimates to guide the DR process. In contrast to previous work where separate views are linked to conduct the analysis, we propose a novel approach, class-constrained t-SNE, that combines data features and class probabilities in the same DR result. Specifically, we combine them by balancing two corresponding components in a cost function to optimize the positions of data points and iconic representation of classes -- class landmarks. Furthermore, an interactive user-adjustable parameter balances these two components so that users can focus on the weighted perspectives of interest and also empowers a smooth visual transition between varying perspectives to preserve the mental map. We illustrate its application potential in model evaluation and visual-interactive labeling. A comparative analysis is performed to evaluate the DR results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes