CV LGMay 11, 2025

Unsupervised Learning for Class Distribution Mismatch

Pan Du, Wangbo Zhao, Xinai Lu, Nian Liu, Zhikai Li, Chaoyu Gong, Suyun Zhao, Hong Chen, Cuiping Li, Kai Wang, Yang You

arXiv:2505.06948v18.42 citationsh-index: 27Has CodeICML

Originality Incremental advance

AI Analysis

This addresses the limitation of previous semi-supervised methods that rely heavily on labeled data, offering a more applicable solution for scenarios with unlabeled data, though it appears incremental as it builds on existing mismatch frameworks.

The paper tackles the problem of class distribution mismatch in training and target tasks by proposing an unsupervised method that uses diffusion models to synthesize training pairs and a confidence-based labeling mechanism, achieving significant improvements over semi-supervised baselines, such as surpassing OpenMatch by up to 72.5% on certain class types in Tiny-ImageNet with a 60% mismatch proportion.

Class distribution mismatch (CDM) refers to the discrepancy between class distributions in training data and target tasks. Previous methods address this by designing classifiers to categorize classes known during training, while grouping unknown or new classes into an "other" category. However, they focus on semi-supervised scenarios and heavily rely on labeled data, limiting their applicability and performance. To address this, we propose Unsupervised Learning for Class Distribution Mismatch (UCDM), which constructs positive-negative pairs from unlabeled data for classifier training. Our approach randomly samples images and uses a diffusion model to add or erase semantic classes, synthesizing diverse training pairs. Additionally, we introduce a confidence-based labeling mechanism that iteratively assigns pseudo-labels to valuable real-world data and incorporates them into the training process. Extensive experiments on three datasets demonstrate UCDM's superiority over previous semi-supervised methods. Specifically, with a 60% mismatch proportion on Tiny-ImageNet dataset, our approach, without relying on labeled data, surpasses OpenMatch (with 40 labels per class) by 35.1%, 63.7%, and 72.5% in classifying known, unknown, and new classes.

View on arXiv PDF Code

Similar