CVCLJan 22, 2024

A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation

MILAOxfordStanford
arXiv:2401.12208v267 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses the need to streamline radiological workflows for radiologists, showing incremental improvements in efficiency through a domain-specific model.

The authors tackled the problem of inefficient chest X-ray interpretation by developing a vision-language foundation model (CheXagent) trained on a large-scale dataset (CheXinstruct), which achieved competitive performance on a novel benchmark and demonstrated a 36% time saving for radiology residents in drafting reports without loss of quality.

Over 1.4 billion chest X-rays (CXRs) are performed annually due to their cost-effectiveness as an initial diagnostic test. This scale of radiological studies provides a significant opportunity to streamline CXR interpretation and documentation. While foundation models are a promising solution, the lack of publicly available large-scale datasets and benchmarks inhibits their iterative development and real-world evaluation. To overcome these challenges, we constructed a large-scale dataset (CheXinstruct), which we utilized to train a vision-language foundation model (CheXagent). We systematically demonstrated competitive performance across eight distinct task types on our novel evaluation benchmark (CheXbench). Beyond technical validation, we assessed the real-world utility of CheXagent in directly drafting radiology reports. Our clinical assessment with eight radiologists revealed a 36% time saving for residents using CheXagent-drafted reports, while attending radiologists showed no significant time difference editing resident-drafted or CheXagent-drafted reports. The CheXagent-drafted reports improved the writing efficiency of both radiology residents and attending radiologists in 81% and 61% of cases, respectively, without loss of quality. Overall, we demonstrate that CheXagent can effectively perform a variety of CXR interpretation tasks and holds potential to assist radiologists in routine clinical workflows.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes