Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval
This addresses the domain gap issue in colonoscopic video retrieval for clinical applications in colorectal cancer prevention, though it is incremental as it adapts existing contrastive learning to a new medical domain.
The paper tackled the problem of colonoscopic video retrieval by constructing a large-scale dataset (Colo-Pair) and proposing a self-supervised contrastive learning method (Colo-SCRL), which significantly outperformed state-of-the-art methods in retrieval performance.
Colonoscopic video retrieval, which is a critical part of polyp treatment, has great clinical significance for the prevention and treatment of colorectal cancer. However, retrieval models trained on action recognition datasets usually produce unsatisfactory retrieval results on colonoscopic datasets due to the large domain gap between them. To seek a solution to this problem, we construct a large-scale colonoscopic dataset named Colo-Pair for medical practice. Based on this dataset, a simple yet effective training method called Colo-SCRL is proposed for more robust representation learning. It aims to refine general knowledge from colonoscopies through masked autoencoder-based reconstruction and momentum contrast to improve retrieval performance. To the best of our knowledge, this is the first attempt to employ the contrastive learning paradigm for medical video retrieval. Empirical results show that our method significantly outperforms current state-of-the-art methods in the colonoscopic video retrieval task.