MED-PHAICVMar 1, 2025

AI-Augmented Thyroid Scintigraphy for Robust Classification

arXiv:2503.00366v2h-index: 24
AI Analysis

This work addresses the challenge of improving diagnostic accuracy for thyroid disorders using AI, but it is incremental as it focuses on enhancing existing methods with data augmentation rather than introducing a new paradigm.

This study tackled the problem of limited and imbalanced datasets in deep learning classification for thyroid scintigraphy by investigating data augmentation strategies, finding that Flow Matching-based augmentation achieved the highest classification accuracy and lowest FID/KID scores, with O+FM+CA yielding the most balanced performance across all classes.

Purpose: Thyroid scintigraphy plays a vital role in diagnosing a range of thyroid disorders. While deep learning classification models hold significant promise in this domain, their effectiveness is frequently compromised by limited and imbalanced datasets. This study investigates the impact of three data augmentation strategies including Stable Diffusion (SD), Flow Matching (FM), and Conventional Augmentation (CA), on enhancing the performance of a ResNet18 classifier. Methods: Anterior thyroid scintigraphy images from 2,954 patients across nine medical centers were classified into four categories: Diffuse Goiter (DG), Nodular Goiter (NG), Normal (NL), and Thyroiditis (TI). Data augmentation was performed using various SD and FM models, resulting in 18 distinct augmentation scenarios. Each augmented dataset was used to train a ResNet18 classifier. Model performance was assessed using class-wise and average precision, recall, F1-score, AUC, and image fidelity metrics (FID and KID). Results: FM-based augmentation outperformed all other methods, achieving the highest classification accuracy and lowest FID/KID scores, indicating both improved model generalization and realistic image synthesis. SD1, combining image and prompt inputs in the inference process, was the most effective SD variant, suggesting that physician-generated prompts provide meaningful clinical context. O+FM+CA yielded the most balanced and robust performance across all classes. Conclusion: Integrating FM and clinically-informed SD augmentation, especially when guided by expert prompts, substantially improves thyroid scintigraphy classification. These findings highlight the importance of leveraging both structured medical input and advanced generative models for more effective training on limited datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes