LG AI CL CV IVSep 4, 2021

Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment

Zhanghexuan Ji, Mohammad Abuzar Shaikh, Dana Moukheiber, Sargur Srihari, Yifan Peng, Mingchen Gao

arXiv:2109.01949v112.525 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of leveraging unlabeled medical data for downstream tasks like image-text retrieval and classification, though it appears incremental as it builds on existing self-supervised and attention-based methods.

The paper tackled the problem of learning joint representations from unlabeled chest X-rays and radiology reports by proposing JoImTeRNet, which achieved improved performance in cross-modality retrieval and multi-label classification on OpenI-IU and MIMIC-CXR datasets.

Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching. Both are bidirectionally constrained on Cross-Entropy based and ranking-based Triplet Matching Losses. The region-word matching is calculated using the attention mechanism without direct supervision about their mapping. The pre-trained multi-modal representation learning paves the way for downstream tasks concerning image and/or text encoding. We demonstrate the representation learning quality by cross-modality retrievals and multi-label classifications on two datasets: OpenI-IU and MIMIC-CXR

View on arXiv PDF Code

Similar