CVAILGJun 15, 2025

Unsupervised Contrastive Learning Using Out-Of-Distribution Data for Long-Tailed Dataset

arXiv:2506.12698v1h-index: 12Neurocomputing
Originality Incremental advance
AI Analysis

It addresses the problem of class imbalance in real-world data for applications like image classification, presenting an incremental improvement over existing methods.

The paper tackles self-supervised learning on long-tailed datasets by leveraging out-of-distribution data to learn balanced and well-separated representations, and demonstrates improved performance over previous state-of-the-art methods on four public datasets.

This work addresses the task of self-supervised learning (SSL) on a long-tailed dataset that aims to learn balanced and well-separated representations for downstream tasks such as image classification. This task is crucial because the real world contains numerous object categories, and their distributions are inherently imbalanced. Towards robust SSL on a class-imbalanced dataset, we investigate leveraging a network trained using unlabeled out-of-distribution (OOD) data that are prevalently available online. We first train a network using both in-domain (ID) and sampled OOD data by back-propagating the proposed pseudo semantic discrimination loss alongside a domain discrimination loss. The OOD data sampling and loss functions are designed to learn a balanced and well-separated embedding space. Subsequently, we further optimize the network on ID data by unsupervised contrastive learning while using the previously trained network as a guiding network. The guiding network is utilized to select positive/negative samples and to control the strengths of attractive/repulsive forces in contrastive learning. We also distil and transfer its embedding space to the training network to maintain balancedness and separability. Through experiments on four publicly available long-tailed datasets, we demonstrate that the proposed method outperforms previous state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes