DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision
This addresses the need for scalable and transferable medical image representations, particularly in low-label settings, though it is incremental as it builds on existing SSL methods.
The paper tackles the problem of shortcut learning and limited generalizability in self-supervised learning for medical images by introducing DiSSECT, which integrates multi-scale vector quantization to impose a discrete bottleneck, resulting in strong performance on classification and segmentation tasks with minimal fine-tuning and high label efficiency.
Self-supervised learning (SSL) has emerged as a powerful paradigm for medical image representation learning, particularly in settings with limited labeled data. However, existing SSL methods often rely on complex architectures, anatomy-specific priors, or heavily tuned augmentations, which limit their scalability and generalizability. More critically, these models are prone to shortcut learning, especially in modalities like chest X-rays, where anatomical similarity is high and pathology is subtle. In this work, we introduce DiSSECT -- Discrete Self-Supervision for Efficient Clinical Transferable Representations, a framework that integrates multi-scale vector quantization into the SSL pipeline to impose a discrete representational bottleneck. This constrains the model to learn repeatable, structure-aware features while suppressing view-specific or low-utility patterns, improving representation transfer across tasks and domains. DiSSECT achieves strong performance on both classification and segmentation tasks, requiring minimal or no fine-tuning, and shows particularly high label efficiency in low-label regimes. We validate DiSSECT across multiple public medical imaging datasets, demonstrating its robustness and generalizability compared to existing state-of-the-art approaches.