CVDec 19, 2023

DMT: Comprehensive Distillation with Multiple Self-supervised Teachers

arXiv:2312.11938v12 citationsh-index: 7ICASSP
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and complementary visual representations in computer vision, though it is incremental as it builds on existing self-supervised learning paradigms.

The paper tackles the problem of self-supervised learning models being trained in isolation by introducing DMT, a method that distills knowledge from multiple self-supervised teachers to compress pretrained models, resulting in performance improvements such as a 4.0% increase in AP/mIoU on dense tasks.

Numerous self-supervised learning paradigms, such as contrastive learning and masked image modeling, have been proposed to acquire powerful and general representations from unlabeled data. However, these models are commonly pretrained within their specific framework alone, failing to consider the complementary nature of visual representations. To tackle this issue, we introduce Comprehensive Distillation with Multiple Self-supervised Teachers (DMT) for pretrained model compression, which leverages the strengths of multiple off-the-shelf self-supervised models. Our experimental results on prominent benchmark datasets exhibit that the proposed method significantly surpasses state-of-the-art competitors while retaining favorable efficiency metrics. On classification tasks, our DMT framework utilizing three different self-supervised ViT-Base teachers enhances the performance of both small/tiny models and the base model itself. For dense tasks, DMT elevates the AP/mIoU of standard SSL models on MS-COCO and ADE20K datasets by 4.0%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes