Stable Diffusion Dataset Generation for Downstream Classification Tasks
This work addresses data scarcity issues for machine learning practitioners by providing a method to create effective synthetic data, though it is incremental as it builds on existing generative models.
The paper tackled the problem of generating synthetic datasets for classification tasks by adapting Stable Diffusion 2.0 with techniques like transfer learning and fine-tuning, resulting in synthetic datasets that outperformed real datasets in a third of cases.
Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.