OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions
This addresses a fundamental challenge in computer vision for researchers and practitioners seeking to leverage both labeled and unlabeled data more effectively, representing an incremental advancement over existing paradigms.
The paper tackles the problem of combining self-supervised and fully supervised learning to improve model performance, proposing OPERA which achieves state-of-the-art results in image classification, segmentation, and object detection across CNN and vision transformer architectures.
The pretrain-finetune paradigm in modern computer vision facilitates the success of self-supervised learning, which tends to achieve better transferability than supervised learning. However, with the availability of massive labeled data, a natural question emerges: how to train a better model with both self and full supervision signals? In this paper, we propose Omni-suPErvised Representation leArning with hierarchical supervisions (OPERA) as a solution. We provide a unified perspective of supervisions from labeled and unlabeled data and propose a unified framework of fully supervised and self-supervised learning. We extract a set of hierarchical proxy representations for each image and impose self and full supervisions on the corresponding proxy representations. Extensive experiments on both convolutional neural networks and vision transformers demonstrate the superiority of OPERA in image classification, segmentation, and object detection. Code is available at: https://github.com/wangck20/OPERA.