MED-PH AI CVMar 1, 2025

AI-Augmented Thyroid Scintigraphy for Robust Classification

Maziar Sabouri, Ghasem Hajianfar, Alireza Rafiei Sardouei, Milad Yazdani, Azin Asadzadeh, Soroush Bagheri, Mohsen Arabi, Seyed Rasoul Zakavi, Emran Askari, Atena Aghaee, Sam Wiseman, Dena Shahriari

arXiv:2503.00366v21.2h-index: 24

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving diagnostic accuracy for thyroid disorders using AI, but it is incremental as it focuses on enhancing existing methods with data augmentation rather than introducing a new paradigm.

This study tackled the problem of limited and imbalanced datasets in deep learning classification for thyroid scintigraphy by investigating data augmentation strategies, finding that Flow Matching-based augmentation achieved the highest classification accuracy and lowest FID/KID scores, with O+FM+CA yielding the most balanced performance across all classes.

Purpose: Thyroid scintigraphy plays a vital role in diagnosing a range of thyroid disorders. While deep learning classification models hold significant promise in this domain, their effectiveness is frequently compromised by limited and imbalanced datasets. This study investigates the impact of three data augmentation strategies including Stable Diffusion (SD), Flow Matching (FM), and Conventional Augmentation (CA), on enhancing the performance of a ResNet18 classifier. Methods: Anterior thyroid scintigraphy images from 2,954 patients across nine medical centers were classified into four categories: Diffuse Goiter (DG), Nodular Goiter (NG), Normal (NL), and Thyroiditis (TI). Data augmentation was performed using various SD and FM models, resulting in 18 distinct augmentation scenarios. Each augmented dataset was used to train a ResNet18 classifier. Model performance was assessed using class-wise and average precision, recall, F1-score, AUC, and image fidelity metrics (FID and KID). Results: FM-based augmentation outperformed all other methods, achieving the highest classification accuracy and lowest FID/KID scores, indicating both improved model generalization and realistic image synthesis. SD1, combining image and prompt inputs in the inference process, was the most effective SD variant, suggesting that physician-generated prompts provide meaningful clinical context. O+FM+CA yielded the most balanced and robust performance across all classes. Conclusion: Integrating FM and clinically-informed SD augmentation, especially when guided by expert prompts, substantially improves thyroid scintigraphy classification. These findings highlight the importance of leveraging both structured medical input and advanced generative models for more effective training on limited datasets.

View on arXiv PDF

Similar