CLApr 15, 2018

Pragmatically Informative Image Captioning with Character-Level Inference

arXiv:1804.05417v21122 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating more discriminative captions in image captioning, which is incremental as it builds on existing RSA methods with a novel efficiency improvement.

The authors tackled the problem of making neural image captioning pragmatically informative by distinguishing inputs from similar images, and achieved this by implementing a character-level Rational Speech Acts model that outperformed non-pragmatic and word-level baselines.

We combine a neural image captioner with a Rational Speech Acts (RSA) model to make a system that is pragmatically informative: its objective is to produce captions that are not merely true but also distinguish their inputs from similar images. Previous attempts to combine RSA with neural image captioning require an inference which normalizes over the entire set of possible utterances. This poses a serious problem of efficiency, previously solved by sampling a small subset of possible utterances. We instead solve this problem by implementing a version of RSA which operates at the level of characters ("a","b","c"...) during the unrolling of the caption. We find that the utterance-level effect of referential captions can be obtained with only character-level decisions. Finally, we introduce an automatic method for testing the performance of pragmatic speaker models, and show that our model outperforms a non-pragmatic baseline as well as a word-level RSA captioner.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes