LGCVHCOct 18, 2022

Unsupervised visualization of image datasets using contrastive learning

arXiv:2210.09879v329 citationsh-index: 44
Originality Incremental advance
AI Analysis

This addresses the challenge of creating meaningful visualizations for image datasets, which is important for researchers and practitioners in machine learning and data analysis, though it is incremental as it builds on existing contrastive learning and embedding techniques.

The paper tackles the problem of visualizing high-dimensional image data by developing t-SimCNE, a method that combines contrastive learning and neighbor embeddings to produce 2D embeddings, achieving classification accuracy comparable to state-of-the-art high-dimensional representations.

Visualization methods based on the nearest neighbor graph, such as t-SNE or UMAP, are widely used for visualizing high-dimensional data. Yet, these approaches only produce meaningful results if the nearest neighbors themselves are meaningful. For images represented in pixel space this is not the case, as distances in pixel space are often not capturing our sense of similarity and therefore neighbors are not semantically close. This problem can be circumvented by self-supervised approaches based on contrastive learning, such as SimCLR, relying on data augmentation to generate implicit neighbors, but these methods do not produce two-dimensional embeddings suitable for visualization. Here, we present a new method, called t-SimCNE, for unsupervised visualization of image data. T-SimCNE combines ideas from contrastive learning and neighbor embeddings, and trains a parametric mapping from the high-dimensional pixel space into two dimensions. We show that the resulting 2D embeddings achieve classification accuracy comparable to the state-of-the-art high-dimensional SimCLR representations, thus faithfully capturing semantic relationships. Using t-SimCNE, we obtain informative visualizations of the CIFAR-10 and CIFAR-100 datasets, showing rich cluster structure and highlighting artifacts and outliers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes