CV AIOct 27, 2021

Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Aakash Kaku, Sahana Upadhya, Narges Razavian

arXiv:2110.14805v111.136 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of limited labeled data in medical imaging by enhancing feature learning in pre-training, though it is incremental as it builds on the existing MoCo framework.

The paper tackles the problem of improving self-supervised learning in medical imaging by modifying the momentum contrastive (MoCo) method to bring intermediate layers' representations closer together, resulting in an average gain of 5% in performance in low-labeled data regimes across three datasets.

We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. To this end, in addition to the contrastive loss, we minimize the mean squared error between the intermediate layer representations or make their cross-correlation matrix closer to an identity matrix. Both loss objectives either outperform standard MoCo, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved MoCo are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard MoCo and learn informative features earlier in the network. Finally, by comparing the output probability distribution of models fine-tuned on small versus large labeled data, we conclude that our proposed method of pre-training leads to lower Kolmogorov-Smirnov distance, as compared to a standard MoCo. This provides additional evidence that our proposed method learns more informative features in the pre-training phase which could be leveraged in a low-labeled data regime.

View on arXiv PDF Code

Similar