CVSep 28, 2024

Introducing SDICE: An Index for Assessing Diversity of Synthetic Medical Datasets

arXiv:2409.19436v12 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This work addresses the understudied issue of diversity in synthetic medical images, which is crucial for improving data augmentation in medical image analysis, though it is incremental as it builds on existing contrastive encoding methods.

The authors tackled the problem of assessing diversity in synthetic medical datasets by proposing the SDICE index, which measures the distance between similarity score distributions of real and synthetic images using a contrastive encoder, and demonstrated its effectiveness on MIMIC-chest X-ray and ImageNet datasets.

Advancements in generative modeling are pushing the state-of-the-art in synthetic medical image generation. These synthetic images can serve as an effective data augmentation method to aid the development of more accurate machine learning models for medical image analysis. While the fidelity of these synthetic images has progressively increased, the diversity of these images is an understudied phenomenon. In this work, we propose the SDICE index, which is based on the characterization of similarity distributions induced by a contrastive encoder. Given a synthetic dataset and a reference dataset of real images, the SDICE index measures the distance between the similarity score distributions of original and synthetic images, where the similarity scores are estimated using a pre-trained contrastive encoder. This distance is then normalized using an exponential function to provide a consistent metric that can be easily compared across domains. Experiments conducted on the MIMIC-chest X-ray and ImageNet datasets demonstrate the effectiveness of SDICE index in assessing synthetic medical dataset diversity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes