OTCXR: Rethinking Self-supervised Alignment using Optimal Transport for Chest X-ray Analysis
This work addresses the problem of accurate thoracic disease diagnosis from chest X-rays for medical practitioners, representing an incremental improvement by integrating optimal transport into self-supervised learning.
The paper tackled the challenge of semantic alignment and capturing subtle details in self-supervised learning for chest X-ray analysis by proposing OTCXR, a framework using optimal transport and a Cross-Viewpoint Semantics Infusion Module, which achieved superior performance over state-of-the-art methods on three public datasets.
Self-supervised learning (SSL) has emerged as a promising technique for analyzing medical modalities such as X-rays due to its ability to learn without annotations. However, conventional SSL methods face challenges in achieving semantic alignment and capturing subtle details, which limits their ability to accurately represent the underlying anatomical structures and pathological features. To address these limitations, we propose OTCXR, a novel SSL framework that leverages optimal transport (OT) to learn dense semantic invariance. By integrating OT with our innovative Cross-Viewpoint Semantics Infusion Module (CV-SIM), OTCXR enhances the model's ability to capture not only local spatial features but also global contextual dependencies across different viewpoints. This approach enriches the effectiveness of SSL in the context of chest radiographs. Furthermore, OTCXR incorporates variance and covariance regularizations within the OT framework to prioritize clinically relevant information while suppressing less informative features. This ensures that the learned representations are comprehensive and discriminative, particularly beneficial for tasks such as thoracic disease diagnosis. We validate OTCXR's efficacy through comprehensive experiments on three publicly available chest X-ray datasets. Our empirical results demonstrate the superiority of OTCXR over state-of-the-art methods across all evaluated tasks, confirming its capability to learn semantically rich representations.