MLLGJan 6, 2018

A Note on the Inception Score

arXiv:1801.01973v2808 citations
AI Analysis

This highlights a critical flaw in evaluation practices for generative models, which is incremental as it builds on existing metrics but essential for advancing the field.

The paper critiques the Inception Score, a widely used evaluation metric for generative models, showing that it fails to provide useful guidance for model comparisons due to suboptimalities in the metric and issues in its application.

Deep generative models are powerful tools that have produced impressive results in recent years. These advances have been for the most part empirically driven, making it essential that we use high quality evaluation metrics. In this paper, we provide new insights into the Inception Score, a recently proposed and widely used evaluation metric for generative models, and demonstrate that it fails to provide useful guidance when comparing models. We discuss both suboptimalities of the metric itself and issues with its application. Finally, we call for researchers to be more systematic and careful when evaluating and comparing generative models, as the advancement of the field depends upon it.

Code Implementations9 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes