CV LG IVFeb 28, 2020

Self-supervised Representation Learning for Ultrasound Video

Jianbo Jiao, Richard Droste, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

arXiv:2003.00105v116.861 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of expensive and scarce expert annotations in medical imaging by enabling representation learning from unlabeled data, which is incremental as it adapts existing self-supervised techniques to the specific domain of ultrasound video.

The paper tackles the problem of learning representations from unlabeled ultrasound video data by proposing a self-supervised approach that uses anatomy-aware tasks, such as correcting video clip order and predicting geometric transformations, to train models without human annotations. Experiments on fetal ultrasound video show that this method effectively learns transferable representations, improving performance on downstream tasks like standard plane detection and saliency prediction.

Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are expensive to collect and can be scarce for medical imaging applications. Therefore, there is significant interest in learning representations from unlabelled raw data. In this paper, we propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video without any type of human annotation. We assume that in order to learn such a representation, the model should identify anatomical structures from the unlabelled data. Therefore we force the model to address anatomy-aware tasks with free supervision from the data itself. Specifically, the model is designed to correct the order of a reshuffled video clip and at the same time predict the geometric transformation applied to the video clip. Experiments on fetal ultrasound video show that the proposed approach can effectively learn meaningful and strong representations, which transfer well to downstream tasks like standard plane detection and saliency prediction.

View on arXiv PDF

Similar