CLSep 14, 2023
Adapted Large Language Models Can Outperform Medical Experts in Clinical Text SummarizationDave Van Veen, Cara Van Uden, Louis Blankemeier et al.
Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP), their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Quantitative assessments with syntactic, semantic, and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with ten physicians evaluates summary completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.
CVJun 1, 2023
Exploring the Versatility of Zero-Shot CLIP for Interstitial Lung Disease ClassificationCara Van Uden, Christian Bluethgen, Maayane Attias et al.
Interstitial lung diseases (ILD) present diagnostic challenges due to their varied manifestations and overlapping imaging features. To address this, we propose a machine learning approach that utilizes CLIP, a multimodal (image and text) self-supervised model, for ILD classification. We extensively integrate zero-shot CLIP throughout our workflow, starting from the initial extraction of image patches from volumetric CT scans and proceeding to ILD classification using "patch montages". Furthermore, we investigate how domain adaptive pretraining (DAPT) CLIP with task-specific images (CT "patch montages" extracted with ILD-specific prompts for CLIP) and/or text (lung-specific sections of radiology reports) affects downstream ILD classification performance. By leveraging CLIP-extracted "patch montages" and DAPT, we achieve strong zero-shot ILD classification results, including an AUROC of 0.893, without the need for any labeled training data. This work highlights the versatility and potential of multimodal models like CLIP for medical image classification tasks where labeled data is scarce.
CVMay 13, 2023
How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer to Novel Tasks and Healthcare SystemsCara Van Uden, Jeremy Irvin, Mars Huang et al.
Self-supervised learning (SSL) enables label efficient training for machine learning models. This is essential for domains such as medical imaging, where labels are costly and time-consuming to curate. However, the most effective supervised or SSL strategy for transferring models to different healthcare systems or novel tasks is not well understood. In this work, we systematically experiment with a variety of supervised and self-supervised pretraining strategies using multimodal datasets of medical images (chest X-rays) and text (radiology reports). We then evaluate their performance on data from two external institutions with diverse sets of tasks. In addition, we experiment with different transfer learning strategies to effectively adapt these pretrained models to new tasks and healthcare systems. Our empirical results suggest that multimodal SSL gives substantial gains over unimodal SSL in performance across new healthcare systems and tasks, comparable to models pretrained with full supervision. We demonstrate additional performance gains with models further adapted to the new dataset and task, using multimodal domain-adaptive pretraining (DAPT), linear probing then finetuning (LP-FT), and both methods combined. We offer suggestions for alternative models to use in scenarios where not all of these additions are feasible. Our results provide guidance for improving the generalization of medical image interpretation models to new healthcare systems and novel tasks.
CLMay 2, 2023
RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language ModelsDave Van Veen, Cara Van Uden, Maayane Attias et al.
We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, or clinical text) and via discrete prompting or parameter-efficient fine-tuning. Our results consistently achieve best performance by maximally adapting to the task via pretraining on clinical text and fine-tuning on RRS examples. Importantly, this method fine-tunes a mere 0.32% of parameters throughout the model, in contrast to end-to-end fine-tuning (100% of parameters). Additionally, we study the effect of in-context examples and out-of-distribution (OOD) training before concluding with a radiologist reader study and qualitative analysis. Our findings highlight the importance of domain adaptation in RRS and provide valuable insights toward developing effective natural language processing solutions for clinical tasks.