CVDec 19, 2023

DMT: Comprehensive Distillation with Multiple Self-supervised Teachers

Yuang Liu, Jing Wang, Qiang Zhou, Fan Wang, Jun Wang, Wei Zhang

arXiv:2312.11938v12.82 citationsh-index: 7ICASSP

Originality Incremental advance

AI Analysis

This addresses the need for more efficient and complementary visual representations in computer vision, though it is incremental as it builds on existing self-supervised learning paradigms.

The paper tackles the problem of self-supervised learning models being trained in isolation by introducing DMT, a method that distills knowledge from multiple self-supervised teachers to compress pretrained models, resulting in performance improvements such as a 4.0% increase in AP/mIoU on dense tasks.

Numerous self-supervised learning paradigms, such as contrastive learning and masked image modeling, have been proposed to acquire powerful and general representations from unlabeled data. However, these models are commonly pretrained within their specific framework alone, failing to consider the complementary nature of visual representations. To tackle this issue, we introduce Comprehensive Distillation with Multiple Self-supervised Teachers (DMT) for pretrained model compression, which leverages the strengths of multiple off-the-shelf self-supervised models. Our experimental results on prominent benchmark datasets exhibit that the proposed method significantly surpasses state-of-the-art competitors while retaining favorable efficiency metrics. On classification tasks, our DMT framework utilizing three different self-supervised ViT-Base teachers enhances the performance of both small/tiny models and the base model itself. For dense tasks, DMT elevates the AP/mIoU of standard SSL models on MS-COCO and ADE20K datasets by 4.0%.

View on arXiv PDF

Similar