Detecting GAN generated errors
This work addresses the challenge of assessing image quality for GAN users, offering a tool to select better samples, though it is incremental as it builds on existing GAN frameworks.
The paper tackles the problem of evaluating individual GAN-generated images by proposing a method to detect errors within them, showing that their metric correlates with FID scores for ranking models like Improved Wasserstein, BigGAN, and StyleGAN.
Despite an impressive performance from the latest GAN for generating hyper-realistic images, GAN discriminators have difficulty evaluating the quality of an individual generated sample. This is because the task of evaluating the quality of a generated image differs from deciding if an image is real or fake. A generated image could be perfect except in a single area but still be detected as fake. Instead, we propose a novel approach for detecting where errors occur within a generated image. By collaging real images with generated images, we compute for each pixel, whether it belongs to the real distribution or generated distribution. Furthermore, we leverage attention to model long-range dependency; this allows detection of errors which are reasonable locally but not holistically. For evaluation, we show that our error detection can act as a quality metric for an individual image, unlike FID and IS. We leverage Improved Wasserstein, BigGAN, and StyleGAN to show a ranking based on our metric correlates impressively with FID scores. Our work opens the door for better understanding of GAN and the ability to select the best samples from a GAN model.