CVNov 6, 2025

MedDChest: A Content-Aware Multimodal Foundational Vision Model for Thoracic Imaging

arXiv:2511.04016v1h-index: 4
Originality Incremental advance
AI Analysis

This provides a better starting point for thoracic diagnostic tasks, though it's an incremental improvement over existing domain adaptation approaches.

The authors tackled the domain gap problem in medical imaging by developing MedDChest, a foundational Vision Transformer pre-trained from scratch on 1.2 million thoracic images, which significantly outperforms ImageNet-pretrained models on downstream diagnostic tasks.

The performance of vision models in medical imaging is often hindered by the prevailing paradigm of fine-tuning backbones pre-trained on out-of-domain natural images. To address this fundamental domain gap, we propose MedDChest, a new foundational Vision Transformer (ViT) model optimized specifically for thoracic imaging. We pre-trained MedDChest from scratch on a massive, curated, multimodal dataset of over 1.2 million images, encompassing different modalities including Chest X-ray and Computed Tomography (CT) compiled from 10 public sources. A core technical contribution of our work is Guided Random Resized Crops, a novel content-aware data augmentation strategy that biases sampling towards anatomically relevant regions, overcoming the inefficiency of standard cropping techniques on medical scans. We validate our model's effectiveness by fine-tuning it on a diverse set of downstream diagnostic tasks. Comprehensive experiments empirically demonstrate that MedDChest significantly outperforms strong, publicly available ImageNet-pretrained models. By establishing the superiority of large-scale, in-domain pre-training combined with domain-specific data augmentation, MedDChest provides a powerful and robust feature extractor that serves as a significantly better starting point for a wide array of thoracic diagnostic tasks. The model weights will be made publicly available to foster future research and applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes