CL AIFeb 28

Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models

Mengxian Lyu, Cheng Peng, Ziyi Chen, Mengyuan Zhang, Jieting Li Lu, Yonghui Wu

arXiv:2603.19275h-index: 3

Originality Incremental advance

AI Analysis

This work addresses the burden on physicians by improving summarization accuracy, though it is incremental as it builds on existing pre-training and fine-tuning strategies.

The study tackled the problem of automatic summarization of radiology reports by proposing a mid-training method for large language models, resulting in the GatorTronT5-Radio model achieving the best performance with improvements in ROUGE-L and RadGraph-F1 scores compared to models without mid-training.

Automatic summarization of radiology reports is an essential application to reduce the burden on physicians. Previous studies have widely used the "pre-training, fine-tuning" strategy to adapt large language models (LLMs) for summarization. This study proposed a subdomain adaptation through a mid-training method to improve summarization. We explored three adaptation strategies: (1) general-domain pre-training, (2) clinical-domain pre-training, and (3) clinical-domain pre-training followed by subdomain mid-training. We developed models using large-scale clinical text from the University of Florida (UF) Health and conducted mid-training and fine-tuning experiments using widely used benchmark datasets including OpenI and MIMIC-CXR. The experimental results show that the mid-trained model, GatorTronT5-Radio, achieved the best performance, outperforming models without mid-training in both text-based measures (ROUGE-L) and factuality measures (RadGraph-F1). Our mid-training methods also demonstrate better few-shot learning and could alleviate the "cold start" problem reported in previous studies as a learning barrier. Our findings support the use of "pre-training, mid-training, fine-tuning," instead of the widely used direct fine-tuning strategy.

View on arXiv PDF

Similar