Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging
This addresses the challenge of limited labeled datasets for medical imaging tasks, offering a generalizable solution that leverages readily available text-image pairs, though it is incremental as it builds on existing contrastive and transfer learning methods.
The paper tackles the problem of insufficient labeled data in medical imaging by using textual reports as weak supervision to pre-train neural networks, resulting in performance improvements that reduce the need for labeled data by 67%-98% across three classification tasks.
A key challenge in training neural networks for a given medical imaging task is often the difficulty of obtaining a sufficient number of manually labeled examples. In contrast, textual imaging reports, which are often readily available in medical records, contain rich but unstructured interpretations written by experts as part of standard clinical practice. We propose using these textual reports as a form of weak supervision to improve the image interpretation performance of a neural network without requiring additional manually labeled examples. We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset. The end result is a neural network that automatically interprets imagery without requiring textual reports during inference. This approach can be applied to any task for which text-image pairs are readily available. We evaluate our method on three classification tasks and find consistent performance improvements, reducing the need for labeled data by 67%-98%.