Unsupervised Segmentation of Colonoscopy Images
This addresses the challenge of limited annotations in medical imaging for clinicians and researchers, though it is incremental as it applies existing self-supervised methods to a specific domain.
The paper tackled the problem of segmenting colonoscopy images without ground truth annotations by using self-supervised features from vision transformers, achieving image classification performance comparable to fully supervised models and enabling unsupervised discovery of clinically relevant structures.
Colonoscopy plays a crucial role in the diagnosis and prognosis of various gastrointestinal diseases. Due to the challenges of collecting large-scale high-quality ground truth annotations for colonoscopy images, and more generally medical images, we explore using self-supervised features from vision transformers in three challenging tasks for colonoscopy images. Our results indicate that image-level features learned from DINO models achieve image classification performance comparable to fully supervised models, and patch-level features contain rich semantic information for object detection. Furthermore, we demonstrate that self-supervised features combined with unsupervised segmentation can be used to discover multiple clinically relevant structures in a fully unsupervised manner, demonstrating the tremendous potential of applying these methods in medical image analysis.