CLAICVLGSep 11, 2021

COSMic: A Coherence-Aware Generation Metric for Image Descriptions

arXiv:2109.05281v1663 citations
Originality Incremental advance
AI Analysis

This addresses the need for better automated evaluation metrics for image captioning models, which is incremental as it builds on coherence theories to improve existing learned metrics.

The paper tackles the problem of inaccurate automated evaluation metrics for image captioning by introducing COSMic, the first discourse-aware learned generation metric that uses coherence relations to assess semantic and pragmatic success. The result is a higher Kendall Correlation Coefficient with human judgments compared to existing metrics like BLEURT and BERTScore on out-of-domain test sets.

Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations. However, image captioning metrics have struggled to give accurate learned estimates of the semantic and pragmatic success of output text. We address this weakness by introducing the first discourse-aware learned generation metric for evaluating image descriptions. Our approach is inspired by computational theories of discourse for capturing information goals using coherence. We present a dataset of image$\unicode{x2013}$description pairs annotated with coherence relations. We then train a coherence-aware metric on a subset of the Conceptual Captions dataset and measure its effectiveness$\unicode{x2014}$its ability to predict human ratings of output captions$\unicode{x2014}$on a test set composed of out-of-domain images. We demonstrate a higher Kendall Correlation Coefficient for our proposed metric with the human judgments for the results of a number of state-of-the-art coherence-aware caption generation models when compared to several other metrics including recently proposed learned metrics such as BLEURT and BERTScore.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes