Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2
This study addresses the need for fine-grained evaluation of image synthesis models on faces, providing a benchmark for researchers and practitioners in generative AI.
The paper quantitatively compares Stable Diffusion, Midjourney, and DALL-E 2 for generating photorealistic faces, finding Stable Diffusion achieves the best performance with a lower FID score, and introduces a dataset of 15,076 generated faces.
The field of image synthesis has made great strides in the last couple of years. Recent models are capable of generating images with astonishing quality. Fine-grained evaluation of these models on some interesting categories such as faces is still missing. Here, we conduct a quantitative comparison of three popular systems including Stable Diffusion, Midjourney, and DALL-E 2 in their ability to generate photorealistic faces in the wild. We find that Stable Diffusion generates better faces than the other systems, according to the FID score. We also introduce a dataset of generated faces in the wild dubbed GFW, including a total of 15,076 faces. Furthermore, we hope that our study spurs follow-up research in assessing the generative models and improving them. Data and code are available at data and code, respectively.