CVCLMay 3, 2016

Improving Image Captioning by Concept-based Sentence Reranking

arXiv:1605.00855v1
Originality Incremental advance
AI Analysis

This work addresses image captioning for computer vision applications, but it is incremental as it builds on existing models with a novel reranking approach.

The paper tackles image captioning by introducing concept-based sentence reranking to improve Google's CNN-LSTM model, achieving a METEOR score of 0.1875 on the ImageCLEF 2015 test set, outperforming the runner-up by a clear margin.

This paper describes our winning entry in the ImageCLEF 2015 image sentence generation task. We improve Google's CNN-LSTM model by introducing concept-based sentence reranking, a data-driven approach which exploits the large amounts of concept-level annotations on Flickr. Different from previous usage of concept detection that is tailored to specific image captioning models, the propose approach reranks predicted sentences in terms of their matches with detected concepts, essentially treating the underlying model as a black box. This property makes the approach applicable to a number of existing solutions. We also experiment with fine tuning on the deep language model, which improves the performance further. Scoring METEOR of 0.1875 on the ImageCLEF 2015 test set, our system outperforms the runner-up (METEOR of 0.1687) with a clear margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes