From Majority to Minority: A Diffusion-based Augmentation for Underrepresented Groups in Skin Lesion Analysis
This work addresses under-diagnosis in medical imaging for underrepresented groups, but it is incremental as it builds on prior ideas of using majority data to supplement minority training.
The paper tackles the problem of AI-based skin cancer diagnosis underperforming on minority groups due to insufficient training data, and proposes a diffusion-based augmentation framework that uses majority group data to generate synthetic images, improving diagnostic results for minority groups even with little or no reference data.
AI-based diagnoses have demonstrated dermatologist-level performance in classifying skin cancer. However, such systems are prone to under-performing when tested on data from minority groups that lack sufficient representation in the training sets. Although data collection and annotation offer the best means for promoting minority groups, these processes are costly and time-consuming. Prior works have suggested that data from majority groups may serve as a valuable information source to supplement the training of diagnosis tools for minority groups. In this work, we propose an effective diffusion-based augmentation framework that maximizes the use of rich information from majority groups to benefit minority groups. Using groups with different skin types as a case study, our results show that the proposed framework can generate synthetic images that improve diagnostic results for the minority groups, even when there is little or no reference data from these target groups. The practical value of our work is evident in medical imaging analysis, where under-diagnosis persists as a problem for certain groups due to insufficient representation.