CV AIJul 22, 2023

Hallucination Improves the Performance of Unsupervised Visual Representation Learning

Jing Wu, Jennifer Hobbs, Naira Hovakimyan

arXiv:2307.12168v114.523 citationsh-index: 54

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in unsupervised visual representation learning for computer vision tasks, offering incremental improvements to existing models.

The paper tackles the issue of insufficient positive pairs and overfitting in contrastive learning by proposing Hallucinator, a differentiable method to generate additional positive samples in feature space, resulting in stable accuracy gains of 0.3% to 3.0% on datasets like CIFAR10 and ImageNet.

Contrastive learning models based on Siamese structure have demonstrated remarkable performance in self-supervised learning. Such a success of contrastive learning relies on two conditions, a sufficient number of positive pairs and adequate variations between them. If the conditions are not met, these frameworks will lack semantic contrast and be fragile on overfitting. To address these two issues, we propose Hallucinator that could efficiently generate additional positive samples for further contrast. The Hallucinator is differentiable and creates new data in the feature space. Thus, it is optimized directly with the pre-training task and introduces nearly negligible computation. Moreover, we reduce the mutual information of hallucinated pairs and smooth them through non-linear operations. This process helps avoid over-confident contrastive learning models during the training and achieves more transformation-invariant feature embeddings. Remarkably, we empirically prove that the proposed Hallucinator generalizes well to various contrastive learning models, including MoCoV1&V2, SimCLR and SimSiam. Under the linear classification protocol, a stable accuracy gain is achieved, ranging from 0.3% to 3.0% on CIFAR10&100, Tiny ImageNet, STL-10 and ImageNet. The improvement is also observed in transferring pre-train encoders to the downstream tasks, including object detection and segmentation.

View on arXiv PDF

Similar