CVLGNov 30, 2021

MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

arXiv:2111.15340v118 citations
Originality Highly original
AI Analysis

This addresses the issue of incomplete labeling in image datasets for computer vision researchers, offering a novel SSL framework that improves multi-concept modeling.

The paper tackles the problem of self-supervised learning (SSL) being limited to modeling a single dominant concept per image, proposing MC-SSL0.0 to model all concepts without labels, and results show it surpasses existing SSL methods and supervised transfer learning on multi-label and multi-class classification tasks.

Self-supervised pretraining is the method of choice for natural language processing models and is rapidly gaining popularity in many vision tasks. Recently, self-supervised pretraining has shown to outperform supervised pretraining for many downstream vision applications, marking a milestone in the area. This superiority is attributed to the negative impact of incomplete labelling of the training images, which convey multiple concepts, but are annotated using a single dominant class label. Although Self-Supervised Learning (SSL), in principle, is free of this limitation, the choice of pretext task facilitating SSL is perpetuating this shortcoming by driving the learning process towards a single concept output. This study aims to investigate the possibility of modelling all the concepts present in an image without using labels. In this aspect the proposed SSL frame-work MC-SSL0.0 is a step towards Multi-Concept Self-Supervised Learning (MC-SSL) that goes beyond modelling single dominant label in an image to effectively utilise the information from all the concepts present in it. MC-SSL0.0 consists of two core design concepts, group masked model learning and learning of pseudo-concept for data token using a momentum encoder (teacher-student) framework. The experimental results on multi-label and multi-class image classification downstream tasks demonstrate that MC-SSL0.0 not only surpasses existing SSL methods but also outperforms supervised transfer learning. The source code will be made publicly available for community to train on bigger corpus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes