CVMay 17, 2015

Exploring Nearest Neighbor Approaches for Image Captioning

arXiv:1505.04467v1201 citations
Originality Synthesis-oriented
AI Analysis

This work addresses image captioning for computer vision applications, but it is incremental as it revisits baseline methods without introducing new techniques.

The paper tackled image captioning by exploring nearest neighbor approaches that borrow captions from similar training images, finding they perform as well as many recent novel captioning methods on automatic metrics like MS COCO, but human studies show novel caption generation is still preferred.

We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the "consensus" of the set of candidate captions gathered from the nearest neighbor images. When measured by automatic evaluation metrics on the MS COCO caption evaluation server, these approaches perform as well as many recent approaches that generate novel captions. However, human studies show that a method that generates novel captions is still preferred over the nearest neighbor approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes