CVJul 19, 2023Code
Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report GenerationAaron Nicolson, Jason Dowling, Bevan Koopman
Radiologists face high burnout rates, partially due to the increasing volume of Chest X-rays (CXRs) requiring interpretation and reporting. Automated CXR report generation holds promise for reducing this burden and improving patient care. While current models show potential, their diagnostic accuracy is limited. Our proposed CXR report generator integrates elements of the radiologist workflow and introduces a novel reward for reinforcement learning. Our approach leverages longitudinal data from a patient's prior CXR study and effectively handles cases where no prior study exist, thus mirroring the radiologist's workflow. In contrast, existing models typically lack this flexibility, often requiring prior studies for the model to function optimally. Our approach also incorporates all CXRs from a patient's study and distinguishes between report sections through section embeddings. Our reward for reinforcement learning leverages CXR-BERT, which forces our model to learn the clinical semantics of radiology reporting. We conduct experiments on publicly available datasets -- MIMIC-CXR and Open-i IU X-ray -- with metrics shown to more closely correlate with radiologists' assessment of reporting. Results from our study demonstrate that the proposed model generates reports that are more aligned with radiologists' reports than state-of-the-art models, such as those utilising large language models, reinforcement learning, and multi-task learning. The proposed model improves the diagnostic accuracy of CXR report generation, which could one day reduce radiologists' workload and enhance patient care. Our Hugging Face checkpoint (https://huggingface.co/aehrc/cxrmate) and code (https://github.com/aehrc/cxrmate) are publicly available.
CVAug 7, 2024Code
e-Health CSIRO at RRG24: Entropy-Augmented Self-Critical Sequence Training for Radiology Report GenerationAaron Nicolson, Jinghui Liu, Jason Dowling et al.
The Shared Task on Large-Scale Radiology Report Generation (RRG24) aims to expedite the development of assistive systems for interpreting and reporting on chest X-ray (CXR) images. This task challenges participants to develop models that generate the findings and impression sections of radiology reports from CXRs from a patient's study, using five different datasets. This paper outlines the e-Health CSIRO team's approach, which achieved multiple first-place finishes in RRG24. The core novelty of our approach lies in the addition of entropy regularisation to self-critical sequence training, to maintain a higher entropy in the token distribution. This prevents overfitting to common phrases and ensures a broader exploration of the vocabulary during training, essential for handling the diversity of the radiology reports in the RRG24 datasets. Our model is available on Hugging Face https://huggingface.co/aehrc/cxrmate-rrg24.
CLJul 3, 2024Code
e-Health CSIRO at "Discharge Me!" 2024: Generating Discharge Summary Sections with Fine-tuned Language ModelsJinghui Liu, Aaron Nicolson, Jason Dowling et al.
Clinical documentation is an important aspect of clinicians' daily work and often demands a significant amount of time. The BioNLP 2024 Shared Task on Streamlining Discharge Documentation (Discharge Me!) aims to alleviate this documentation burden by automatically generating discharge summary sections, including brief hospital course and discharge instruction, which are often time-consuming to synthesize and write manually. We approach the generation task by fine-tuning multiple open-sourced language models (LMs), including both decoder-only and encoder-decoder LMs, with various configurations on input context. We also examine different setups for decoding algorithms, model ensembling or merging, and model specialization. Our results show that conditioning on the content of discharge summary prior to the target sections is effective for the generation task. Furthermore, we find that smaller encoder-decoder LMs can work as well or even slightly better than larger decoder based LMs fine-tuned through LoRA. The model checkpoints from our team (aehrc) are openly available.
IVAug 18, 2024
Deformation-aware GAN for Medical Image Synthesis with Substantially Misaligned PairsBowen Xin, Tony Young, Claire E Wainwright et al.
Medical image synthesis generates additional imaging modalities that are costly, invasive or harmful to acquire, which helps to facilitate the clinical workflow. When training pairs are substantially misaligned (e.g., lung MRI-CT pairs with respiratory motion), accurate image synthesis remains a critical challenge. Recent works explored the directional registration module to adjust misalignment in generative adversarial networks (GANs); however, substantial misalignment will lead to 1) suboptimal data mapping caused by correspondence ambiguity, and 2) degraded image fidelity caused by morphology influence on discriminators. To address the challenges, we propose a novel Deformation-aware GAN (DA-GAN) to dynamically correct the misalignment during the image synthesis based on multi-objective inverse consistency. Specifically, in the generative process, three levels of inverse consistency cohesively optimise symmetric registration and image generation for improved correspondence. In the adversarial process, to further improve image fidelity under misalignment, we design deformation-aware discriminators to disentangle the mismatched spatial morphology from the judgement of image fidelity. Experimental results show that DA-GAN achieved superior performance on a public dataset with simulated misalignments and a real-world lung MRI-CT dataset with respiratory motion misalignment. The results indicate the potential for a wide range of medical image synthesis tasks such as radiotherapy planning.
48.3CVApr 21
Toward Clinically Acceptable Chest X-ray Report Generation: A Qualitative Retrospective Pilot Study of CXRMate-2Aaron Nicolson, Elizabeth J. Cooper, Hwan-Jin Yoon et al.
Chest X-ray (CXR) radiology report generation (RRG) models have shown rapid progress, yet their clinical utility remains uncertain due to limited evaluation by radiologists. We present CXRMate-2, a state-of-the-art CXR RRG model that integrates structured multimodal conditioning and reinforcement learning with a composite reward for semantic alignment with radiologist reports. Across the MIMIC-CXR, CheXpert Plus, and ReXgradient datasets, CXRMate-2 achieves statistically significant improvements over strong benchmarks, including gains of 11.2% and 24.4% in GREEN and RadGraph-XL, respectively, on MIMIC-CXR relative to MedGemma 1.5 (4B). To directly compare CXRMate-2 against radiologist reporting, we conduct a blinded, randomised qualitative retrospective evaluation. Three consultant radiologists compare generated and radiologist reports across 120 studies from the MIMIC-CXR test set. Generated reports were deemed acceptable (defined as preferred or rated equally to radiologist reports) in 45% of ratings, with no statistically significant difference in preference rates between radiologist reports and acceptable generated reports for seven of the eight analysed findings. Preference for radiologist reports was driven primarily by higher recall, while generated reports were often preferred for readability. Together, these results suggest a credible pathway to clinically acceptable CXR RRG. Improvements in recall, alongside better detection of subtle findings (e.g., pulmonary congestion), are likely sufficient to achieve non-inferiority to radiologist reporting. With these targeted advances, CXR RRG systems may be ready for prospective evaluation in assistive roles within radiologist-led workflows.
28.7MED-PHMay 13
Generating synthetic computed tomography for radiotherapy: SynthRAD2025 challenge reportViktor Rogowski, Maarten L. Terpstra, Niklas Wahl et al.
Radiation therapy (RT) requires precise dose delivery over multiple fractions, with CT fundamental for treatment planning due to its electron density information. Repeated CT acquisitions impose radiation exposure and logistical burdens, MRI lacks electron density, and cone-beam CT (CBCT) requires correction for dose calculation. Synthetic CT (sCT) generation addresses these by converting MRI or CBCT into CT-equivalent images with accurate Hounsfield Unit (HU) values, enabling MRI-only RT and CBCT-based adaptive workflows. Building on SynthRAD2023, SynthRAD2025 benchmarked sCT methods on 2,362 patients from five European centers across head and neck, thorax, and abdomen. Two tasks: MRI-to-CT (890 cases) and CBCT-to-CT (1,472 cases), evaluated via image similarity (MAE, PSNR, MS-SSIM), segmentation (Dice, HD95), and dosimetric metrics from photon and proton plans. With 803 participants and 12/13 valid submissions, Task 1 top performance reached MAE $64.8\pm21.3$ HU, PSNR $\sim$30 dB, MS-SSIM $\sim$0.936, Dice 0.79, photon $γ_{2\%/2\text{mm}}>98\%$, proton $γ\approx85\%$. Task 2 improved: MAE $48.3\pm13.4$ HU, PSNR 32.6 dB, MS-SSIM 0.968, Dice 0.86, photon $γ>99\%$, proton $γ\approx89\%$. Strong image--segmentation correlations ($ρ=0.78$--$0.79$) but moderate dose correlations confirmed image quality is insufficient as a dosimetric surrogate. Head-and-neck cases were most consistent; thoracic and abdominal cases showed greater variability. Residual errors at tissue interfaces propagate along beam paths, affecting proton dose more than photon. SynthRAD2025 demonstrates that deep learning yields clinically relevant sCTs, especially for CBCT-to-CT, while identifying persistent MRI-to-CT challenges and underscoring dose-based evaluation as essential for clinical validation.
CVJan 24, 2022Code
Improving Chest X-Ray Report Generation by Leveraging Warm StartingAaron Nicolson, Jason Dowling, Bevan Koopman
Automatically generating a report from a patient's Chest X-Rays (CXRs) is a promising solution to reducing clinical workload and improving patient care. However, current CXR report generators -- which are predominantly encoder-to-decoder models -- lack the diagnostic accuracy to be deployed in a clinical setting. To improve CXR report generation, we investigate warm starting the encoder and decoder with recent open-source computer vision and natural language processing checkpoints, such as the Vision Transformer (ViT) and PubMedBERT. To this end, each checkpoint is evaluated on the MIMIC-CXR and IU X-Ray datasets. Our experimental investigation demonstrates that the Convolutional vision Transformer (CvT) ImageNet-21K and the Distilled Generative Pre-trained Transformer 2 (DistilGPT2) checkpoints are best for warm starting the encoder and decoder, respectively. Compared to the state-of-the-art ($\mathcal{M}^2$ Transformer Progressive), CvT2DistilGPT2 attained an improvement of 8.3\% for CE F-1, 1.8\% for BLEU-4, 1.6\% for ROUGE-L, and 1.0\% for METEOR. The reports generated by CvT2DistilGPT2 have a higher similarity to radiologist reports than previous approaches. This indicates that leveraging warm starting improves CXR report generation. Code and checkpoints for CvT2DistilGPT2 are available at https://github.com/aehrc/cvt2distilgpt2.
CVMar 31, 2025
IMPACT: A Generic Semantic Loss for Multimodal Medical Image RegistrationValentin Boussot, Cédric Hémon, Jean-Claude Nunes et al.
Image registration is fundamental in medical imaging, enabling precise alignment of anatomical structures for diagnosis, treatment planning, image-guided interventions, and longitudinal monitoring. This work introduces IMPACT (Image Metric with Pretrained model-Agnostic Comparison for Transmodality registration), a novel similarity metric designed for robust multimodal image registration. Rather than relying on raw intensities, handcrafted descriptors, or task-specific training, IMPACT defines a semantic similarity measure based on the comparison of deep features extracted from large-scale pretrained segmentation models. By leveraging representations from models such as TotalSegmentator, Segment Anything (SAM), and other foundation networks, IMPACT provides a task-agnostic, training-free solution that generalizes across imaging modalities. These features, originally trained for segmentation, offer strong spatial correspondence and semantic alignment capabilities, making them naturally suited for registration. The method integrates seamlessly into both algorithmic (Elastix) and learning-based (VoxelMorph) frameworks, leveraging the strengths of each. IMPACT was evaluated on five challenging 3D registration tasks involving thoracic CT/CBCT and pelvic MR/CT datasets. Quantitative metrics, including Target Registration Error and Dice Similarity Coefficient, demonstrated consistent improvements in anatomical alignment over baseline methods. Qualitative analyses further highlighted the robustness of the proposed metric in the presence of noise, artifacts, and modality variations. With its versatility, efficiency, and strong performance across diverse tasks, IMPACT offers a powerful solution for advancing multimodal image registration in both clinical and research settings.
IVJun 30, 2025
Artificial Intelligence-assisted Pixel-level Lung (APL) Scoring for Fast and Accurate Quantification in Ultra-short Echo-time MRIBowen Xin, Rohan Hickey, Tamara Blake et al.
Lung magnetic resonance imaging (MRI) with ultrashort echo-time (UTE) represents a recent breakthrough in lung structure imaging, providing image resolution and quality comparable to computed tomography (CT). Due to the absence of ionising radiation, MRI is often preferred over CT in paediatric diseases such as cystic fibrosis (CF), one of the most common genetic disorders in Caucasians. To assess structural lung damage in CF imaging, CT scoring systems provide valuable quantitative insights for disease diagnosis and progression. However, few quantitative scoring systems are available in structural lung MRI (e.g., UTE-MRI). To provide fast and accurate quantification in lung MRI, we investigated the feasibility of novel Artificial intelligence-assisted Pixel-level Lung (APL) scoring for CF. APL scoring consists of 5 stages, including 1) image loading, 2) AI lung segmentation, 3) lung-bounded slice sampling, 4) pixel-level annotation, and 5) quantification and reporting. The results shows that our APL scoring took 8.2 minutes per subject, which was more than twice as fast as the previous grid-level scoring. Additionally, our pixel-level scoring was statistically more accurate (p=0.021), while strongly correlating with grid-level scoring (R=0.973, p=5.85e-9). This tool has great potential to streamline the workflow of UTE lung MRI in clinical settings, and be extended to other structural lung MRI sequences (e.g., BLADE MRI), and for other lung diseases (e.g., bronchopulmonary dysplasia).
CVJun 19, 2024
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate ItAaron Nicolson, Shengyao Zhuang, Jason Dowling et al.
This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on CXR images and limited radiology data, overlooking valuable information from patient health records, particularly from emergency departments. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we incorporate detailed patient information such as vital signs, medicines, and clinical history to enhance diagnostic accuracy. We introduce a novel approach to transform these heterogeneous data sources into embeddings that prompt a multimodal language model; this significantly enhances the diagnostic accuracy of generated radiology reports. Our comprehensive evaluation demonstrates the benefits of using a broader set of patient data, underscoring the potential for enhanced diagnostic capabilities and better patient outcomes through the integration of multimodal data in CXR report generation.
IVJun 12, 2024
Evaluating the Impact of Sequence Combinations on Breast Tumor Segmentation in Multiparametric MRIHang Min, Gorane Santamaria Hormaechea, Prabhakar Ramachandran et al.
Multiparametric magnetic resonance imaging (mpMRI) is a key tool for assessing breast cancer progression. Although deep learning has been applied to automate tumor segmentation in breast MRI, the effect of sequence combinations in mpMRI remains under-investigated. This study explores the impact of different combinations of T2-weighted (T2w), dynamic contrast-enhanced MRI (DCE-MRI) and diffusion-weighted imaging (DWI) with apparent diffusion coefficient (ADC) map on breast tumor segmentation using nnU-Net. Evaluated on a multicenter mpMRI dataset, the nnU-Net model using DCE sequences achieved a Dice similarity coefficient (DSC) of 0.69 $\pm$ 0.18 for functional tumor volume (FTV) segmentation. For whole tumor mask (WTM) segmentation, adding the predicted FTV to DWI and ADC map improved the DSC from 0.57 $\pm$ 0.24 to 0.60 $\pm$ 0.21. Adding T2w did not yield significant improvement, which still requires further investigation under a more standardized imaging protocol. This study serves as a foundation for future work on predicting breast cancer treatment response using mpMRI.
IVSep 7, 2020
Going deeper with brain morphometry using neural networksRodrigo Santa Cruz, Léo Lebrat, Pierrick Bourgeat et al.
Brain morphometry from magnetic resonance imaging (MRI) is a consolidated biomarker for many neurodegenerative diseases. Recent advances in this domain indicate that deep convolutional neural networks can infer morphometric measurements within a few seconds. Nevertheless, the accuracy of the devised model for insightful bio-markers (mean curvature and thickness) remains unsatisfactory. In this paper, we propose a more accurate and efficient neural network model for brain morphometry named HerstonNet. More specifically, we develop a 3D ResNet-based neural network to learn rich features directly from MRI, design a multi-scale regression scheme by predicting morphometric measures at feature maps of different resolutions, and leverage a robust optimization method to avoid poor quality minima and reduce the prediction variance. As a result, HerstonNet improves the existing approach by 24.30% in terms of intraclass correlation coefficient (agreement measure) to FreeSurfer silver-standards while maintaining a competitive run-time.