CVApr 16

Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images

arXiv:2604.1450649.0h-index: 36
Predicted impact top 70% in CV · last 90 daysOriginality Highly original
AI Analysis

This work addresses the challenge of effective self-supervised learning for medical images using Swin Transformers, which lack a global [CLS] token, by introducing attention-guided masking and a noisy teacher to maintain attention head diversity.

The authors propose DAGMaN, a co-distilled attention-guided masked image modeling method with a noisy teacher for self-supervised learning on medical images, achieving state-of-the-art performance on lung nodule classification, immunotherapy outcome prediction, tumor segmentation, and unsupervised organ clustering.

Masked image modeling (MIM) is a highly effective self-supervised learning (SSL) approach to extract useful feature representations from unannotated data. Predominantly used random masking methods make SSL less effective for medical images due to the contextual similarity of neighboring patches, leading to information leakage and SSL simplification. Hierarchical shifted window (Swin) transformer, a highly effective approach for medical images cannot use advanced masking methods as it lacks a global [CLS] token. Hence, we introduced an attention guided masking mechanism for Swin within a co-distillation learning framework to selectively mask semantically co-occurring and discriminative patches, to reduce information leakage and increase the difficulty of SSL pretraining. However, attention guided masking inevitably reduces the diversity of attention heads, which negatively impacts downstream task performance. To address this, we for the first time, integrate a noisy teacher into the co-distillation framework (termed DAGMaN) that performs attentive masking while preserving high attention head diversity. We demonstrate the capability of DAGMaN on multiple tasks including full- and few-shot lung nodule classification, immunotherapy outcome prediction, tumor segmentation, and unsupervised organs clustering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes