Advancing Radiograph Representation Learning with Masked Record Modeling
This work addresses the challenge of improving medical image analysis for radiography tasks by leveraging complementary learning objectives, though it appears incremental in combining existing approaches.
The paper tackled the problem of radiograph representation learning by proposing a unified framework that combines self-supervision and report-completion through masked record modeling, resulting in superior performance in label-efficient fine-tuning, such as achieving 88.5% mean AUC on CheXpert with 1% labeled data.
Modern studies in radiograph representation learning rely on either self-supervision to encode invariant semantics or associated radiology reports to incorporate medical expertise, while the complementarity between them is barely noticed. To explore this, we formulate the self- and report-completion as two complementary objectives and present a unified framework based on masked record modeling (MRM). In practice, MRM reconstructs masked image patches and masked report tokens following a multi-task scheme to learn knowledge-enhanced semantic representations. With MRM pre-training, we obtain pre-trained models that can be well transferred to various radiography tasks. Specifically, we find that MRM offers superior performance in label-efficient fine-tuning. For instance, MRM achieves 88.5% mean AUC on CheXpert using 1% labeled data, outperforming previous R$^2$L methods with 100% labels. On NIH ChestX-ray, MRM outperforms the best performing counterpart by about 3% under small labeling ratios. Besides, MRM surpasses self- and report-supervised pre-training in identifying the pneumonia type and the pneumothorax area, sometimes by large margins.