Minimum Data, Maximum Impact: 20 annotated samples for explainable lung nodule classification
This work addresses the problem of limited annotated data for clinicians using explainable AI in medical imaging, though it is incremental as it builds on existing generative and explainable methods.
The authors tackled the scarcity of attribute-annotated medical image datasets for explainable lung nodule classification by synthesizing data using a generative model, which increased attribute prediction accuracy by 13.4% and target prediction accuracy by 1.8% compared to training with only a small real dataset.
Classification models that provide human-interpretable explanations enhance clinicians' trust and usability in medical image diagnosis. One research focus is the integration and prediction of pathology-related visual attributes used by radiologists alongside the diagnosis, aligning AI decision-making with clinical reasoning. Radiologists use attributes like shape and texture as established diagnostic criteria and mirroring these in AI decision-making both enhances transparency and enables explicit validation of model outputs. However, the adoption of such models is limited by the scarcity of large-scale medical image datasets annotated with these attributes. To address this challenge, we propose synthesizing attribute-annotated data using a generative model. We enhance the Diffusion Model with attribute conditioning and train it using only 20 attribute-labeled lung nodule samples from the LIDC-IDRI dataset. Incorporating its generated images into the training of an explainable model boosts performance, increasing attribute prediction accuracy by 13.4% and target prediction accuracy by 1.8% compared to training with only the small real attribute-annotated dataset. This work highlights the potential of synthetic data to overcome dataset limitations, enhancing the applicability of explainable models in medical image analysis.