A Distributional Evaluation of Generative Image Models
This work addresses a core challenge in AI for researchers and practitioners by providing a more statistically principled evaluation method for generative models, though it is incremental as it builds on existing metrics.
The paper tackles the problem of evaluating generative image models by addressing the limitations of existing metrics like FID, which fail to capture distributional differences in tails, and proposes the Embedded Characteristic Score (ECS) as a comprehensive metric for assessing distributional match.
Generative models are ubiquitous in modern artificial intelligence (AI) applications. Recent advances have led to a variety of generative modeling approaches that are capable of synthesizing highly realistic samples. Despite these developments, evaluating the distributional match between the synthetic samples and the target distribution in a statistically principled way remains a core challenge. We focus on evaluating image generative models, where studies often treat human evaluation as the gold standard. Commonly adopted metrics, such as the Fréchet Inception Distance (FID), do not sufficiently capture the differences between the learned and target distributions, because the assumption of normality ignores differences in the tails. We propose the Embedded Characteristic Score (ECS), a comprehensive metric for evaluating the distributional match between the learned and target sample distributions, and explore its connection with moments and tail behavior. We derive natural properties of ECS and show its practical use via simulations and an empirical study.