LGMLJan 10, 2020

Towards GAN Benchmarks Which Require Generalization

arXiv:2001.03653v164 citations
AI Analysis

This tackles the issue of unreliable benchmarks for unconditional image generation, which is crucial for researchers and practitioners in generative modeling, though it is incremental as it builds on existing NND concepts.

The paper addresses the problem of GAN evaluation metrics that can be trivially gamed by memorizing the training set, proposing neural network divergences (NNDs) as a solution. Through experimental validation, they show that their implemented metric effectively measures diversity, sample quality, and generalization, with results indicating it cannot be won by memorization.

For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic. We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model. In search of such a metric, we turn to neural network divergences (NNDs), which are defined in terms of a neural network trained to distinguish between distributions. The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples. We survey past work on using NNDs for evaluation and implement an example black-box metric based on these ideas. Through experimental validation we show that it can effectively measure diversity, sample quality, and generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes