Pros and Cons of GAN Evaluation Measures: New Developments
It addresses the problem of improving GAN evaluation for the machine learning community, but it is incremental as an update to prior work.
This paper updates a previous review on GAN evaluation measures, highlighting that despite popular metrics like Inception Score and Frechet Inception Distance, evaluation remains unsettled with room for improvement, and it discusses new dimensions such as bias, fairness, and connections to deepfakes.
This work is an update of a previous paper on the same topic published a few years ago. With the dramatic progress in generative modeling, a suite of new quantitative and qualitative techniques to evaluate models has emerged. Although some measures such as Inception Score, Frechet Inception Distance, Precision-Recall, and Perceptual Path Length are relatively more popular, GAN evaluation is not a settled issue and there is still room for improvement. Here, I describe new dimensions that are becoming important in assessing models (e.g. bias and fairness) and discuss the connection between GAN evaluation and deepfakes. These are important areas of concern in the machine learning community today and progress in GAN evaluation can help mitigate them.