CVMar 12, 2018

Discriminability objective for training descriptive captions

arXiv:1803.04376v2211 citations
Originality Incremental advance
AI Analysis

This addresses the issue of non-discriminative captions in image captioning systems, which is incremental as it enhances existing methods rather than introducing a new paradigm.

The paper tackled the problem of generating image captions that lack discriminability, proposing a training objective that incorporates a loss component for disambiguating image/caption matches, resulting in captions that are much more discriminative according to human evaluation and also improve standard scores like BLEU and SPICE.

One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them. We propose a way to improve this aspect of caption generation. By incorporating into the captioning training objective a loss component directly related to ability (by a machine) to disambiguate image/caption matches, we obtain systems that produce much more discriminative caption, according to human evaluation. Remarkably, our approach leads to improvement in other aspects of generated captions, reflected by a battery of standard scores such as BLEU, SPICE etc. Our approach is modular and can be applied to a variety of model/loss combinations commonly proposed for image captioning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes