Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning
This work addresses the need for automated diagnostic tools for low back pain, which affects millions worldwide, by providing a robust model for clinical decision support, though it is incremental as it adapts existing contrastive learning methods to a specific medical domain.
The authors tackled the problem of diagnosing low back pain by developing LumbarCLIP, a multimodal framework that aligns lumbar spine MRI scans with radiological text reports using contrastive learning, achieving up to 95.00% accuracy and 94.75% F1-score on a test set.
Low back pain affects millions worldwide, driving the need for robust diagnostic models that can jointly analyze complex medical images and accompanying text reports. We present LumbarCLIP, a novel multimodal framework that leverages contrastive language-image pretraining to align lumbar spine MRI scans with corresponding radiological descriptions. Built upon a curated dataset containing axial MRI views paired with expert-written reports, LumbarCLIP integrates vision encoders (ResNet-50, Vision Transformer, Swin Transformer) with a BERT-based text encoder to extract dense representations. These are projected into a shared embedding space via learnable projection heads, configurable as linear or non-linear, and normalized to facilitate stable contrastive training using a soft CLIP loss. Our model achieves state-of-the-art performance on downstream classification, reaching up to 95.00% accuracy and 94.75% F1-score on the test set, despite inherent class imbalance. Extensive ablation studies demonstrate that linear projection heads yield more effective cross-modal alignment than non-linear variants. LumbarCLIP offers a promising foundation for automated musculoskeletal diagnosis and clinical decision support.