IV CVAug 12, 2024

Five Pitfalls When Assessing Synthetic Medical Images with Reference Metrics

Melanie Dohmen, Tuan Truong, Ivo M. Baltruschat, Matthias Lenga

arXiv:2408.06075v211.910 citationsh-index: 6

Originality Synthesis-oriented

AI Analysis

This work addresses a critical problem for researchers in medical imaging by highlighting incremental but important limitations in evaluation methods.

The paper identifies five pitfalls in using standard reference metrics like SSIM, PSNR, and MAE to evaluate synthetic medical images, as these metrics often fail due to differences in image content, format, and interpretation compared to natural images, and it discusses strategies to avoid these issues.

Reference metrics have been developed to objectively and quantitatively compare two images. Especially for evaluating the quality of reconstructed or compressed images, these metrics have shown very useful. Extensive tests of such metrics on benchmarks of artificially distorted natural images have revealed which metric best correlate with human perception of quality. Direct transfer of these metrics to the evaluation of generative models in medical imaging, however, can easily lead to pitfalls, because assumptions about image content, image data format and image interpretation are often very different. Also, the correlation of reference metrics and human perception of quality can vary strongly for different kinds of distortions and commonly used metrics, such as SSIM, PSNR and MAE are not the best choice for all situations. We selected five pitfalls that showcase unexpected and probably undesired reference metric scores and discuss strategies to avoid them.

View on arXiv PDF

Similar