LG CV MLOct 22, 2019

Establishing an Evaluation Metric to Quantify Climate Change Image Realism

Sharon Zhou, Alexandra Luccioni, Gautier Cosne, Michael S. Bernstein, Yoshua Bengio

arXiv:1910.10143v13.43 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the lack of evaluation metrics for realism in conditional generative models, specifically for climate change awareness applications, but it is incremental as it adapts existing metrics without establishing a fully automated solution.

The paper tackles the problem of evaluating the realism of climate change-induced flooding images generated by conditional models, proposing several automated and human-based methods and finding that a modified Fréchet Inception Distance (FID) metric correlates best with human judgments.

With success on controlled tasks, generative models are being increasingly applied to humanitarian applications [1,2]. In this paper, we focus on the evaluation of a conditional generative model that illustrates the consequences of climate change-induced flooding to encourage public interest and awareness on the issue. Because metrics for comparing the realism of different modes in a conditional generative model do not exist, we propose several automated and human-based methods for evaluation. To do this, we adapt several existing metrics, and assess the automated metrics against gold standard human evaluation. We find that using Fréchet Inception Distance (FID) with embeddings from an intermediary Inception-V3 layer that precedes the auxiliary classifier produces results most correlated with human realism. While insufficient alone to establish a human-correlated automatic evaluation metric, we believe this work begins to bridge the gap between human and automated generative evaluation procedures.

View on arXiv PDF

Similar