CVMar 30, 2016

Rich Image Captioning in the Wild

arXiv:1603.09016v2129 citations
AI Analysis

This work addresses the need for robust image captioning in practical applications, though it appears incremental as it builds on an existing framework.

The paper tackles the problem of generating high-quality image captions for diverse real-world images, achieving significant performance improvements over previous state-of-the-art systems on both in-domain and out-of-domain datasets.

We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out of-domain datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes