IV CV MED-PHJan 29, 2025

Trustworthy image-to-image translation: evaluating uncertainty calibration in unpaired training scenarios

Ciaran Bench, Emir Ahmed, Spencer A. Thomas

arXiv:2501.17570v111.32 citationsh-index: 6IJCNN

Originality Incremental advance

AI Analysis

This work addresses the challenge of deploying trustworthy automated diagnostic tools in healthcare, where data limitations hinder model evaluation, though it is incremental in applying existing uncertainty methods to unpaired training.

The paper tackles the problem of evaluating uncertainty calibration in unpaired image-to-image translation models for medical imaging, specifically mammography, by proposing a scheme to assess model trustworthiness without ground truth data, showing that diffusion-based SynDiff outperforms GAN-based cycleGAN in calibration metrics.

Mammographic screening is an effective method for detecting breast cancer, facilitating early diagnosis. However, the current need to manually inspect images places a heavy burden on healthcare systems, spurring a desire for automated diagnostic protocols. Techniques based on deep neural networks have been shown effective in some studies, but their tendency to overfit leaves considerable risk for poor generalisation and misdiagnosis, preventing their widespread adoption in clinical settings. Data augmentation schemes based on unpaired neural style transfer models have been proposed that improve generalisability by diversifying the representations of training image features in the absence of paired training data (images of the same tissue in either image style). But these models are similarly prone to various pathologies, and evaluating their performance is challenging without ground truths/large datasets (as is often the case in medical imaging). Here, we consider two frameworks/architectures: a GAN-based cycleGAN, and the more recently developed diffusion-based SynDiff. We evaluate their performance when trained on image patches parsed from three open access mammography datasets and one non-medical image dataset. We consider the use of uncertainty quantification to assess model trustworthiness, and propose a scheme to evaluate calibration quality in unpaired training scenarios. This ultimately helps facilitate the trustworthy use of image-to-image translation models in domains where ground truths are not typically available.

View on arXiv PDF

Similar