Unleashing the Potential of Synthetic Images: A Study on Histopathology Image Classification
This work addresses the problem of costly and scarce annotated histopathology images for medical diagnosis, though it is incremental as it builds on existing generative methods.
The study tackled the challenge of limited histopathology image datasets by evaluating generative models for creating synthetic images, finding that diffusion models are effective for transfer learning and GANs for augmentation, with transformer-based models not requiring filtering, leading to improved classification performance on the PCam dataset.
Histopathology image classification is crucial for the accurate identification and diagnosis of various diseases but requires large and diverse datasets. Obtaining such datasets, however, is often costly and time-consuming due to the need for expert annotations and ethical constraints. To address this, we examine the suitability of different generative models and image selection approaches to create realistic synthetic histopathology image patches conditioned on class labels. Our findings highlight the importance of selecting an appropriate generative model type and architecture to enhance performance. Our experiments over the PCam dataset show that diffusion models are effective for transfer learning, while GAN-generated samples are better suited for augmentation. Additionally, transformer-based generative models do not require image filtering, in contrast to those derived from Convolutional Neural Networks (CNNs), which benefit from realism score-based selection. Therefore, we show that synthetic images can effectively augment existing datasets, ultimately improving the performance of the downstream histopathology image classification task.