CVCLLGDec 19, 2018

Generating Diverse and Meaningful Captions

arXiv:1812.08126v11 citations
Originality Incremental advance
AI Analysis

This addresses a limitation in image captioning for applications requiring varied and specific descriptions, though it appears incremental as it builds on existing methods.

The paper tackled the problem of image captioning models producing generic captions for similar images, and improved state-of-the-art results in caption diversity and novelty through an unsupervised training approach using an Image Retrieval model.

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes