CLApr 21, 2018

Entity-aware Image Caption Generation

Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, Shih-Fu Chang

arXiv:1804.07889v232.31115 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for more informative, news-style image descriptions, particularly for applications requiring detailed entity recognition, though it is incremental as it builds on existing CNN-LSTM and knowledge graph methods.

The paper tackled the problem of generating image captions that lack specific named entities by proposing a new task that uses images and hashtags as input, resulting in a model that significantly outperforms unimodal baselines on a new Flickr benchmark dataset.

Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. We propose a simple but effective approach to tackle this problem. We first train a convolutional neural networks - long short term memory networks (CNN-LSTM) model to generate a template caption based on the input image. Then we use a knowledge graph based collective inference algorithm to fill in the template with specific named entities retrieved via the hashtags. Experiments on a new benchmark dataset collected from Flickr show that our model generates news-style image descriptions with much richer information. Our model outperforms unimodal baselines significantly with various evaluation metrics.

View on arXiv PDF Code

Similar