CVLGNov 3, 2020

Attention Beam: An Image Captioning Approach

arXiv:2011.01753v26 citations
AI Analysis

This work addresses image captioning for applications in computer vision and natural language processing, but it is incremental as it builds on existing encoder-decoder methods.

The authors tackled the problem of generating textual descriptions for images by proposing a heuristic beam search method on top of encoder-decoder architectures, resulting in improved caption quality on benchmark datasets Flickr8k, Flickr30k, and MS COCO.

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently generate a human-like description for the image (natural language understanding). In recent times, encoder-decoder based architectures have achieved state-of-the-art results for image captioning. Here, we present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes