CVCLLGFeb 5, 2016

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features

arXiv:1602.01895v1
Originality Incremental advance
AI Analysis

This work addresses image captioning for computer vision and NLP applications, but it is incremental as it builds on existing CNN-RNN frameworks with a memory enhancement.

The paper tackled the problem of generating natural language descriptions for images by introducing a model that adds memory cells to gate the feeding of image features to a deep neural network, resulting in outperforming state-of-the-art models with higher BLEU scores on Flickr8K and Flickr30K datasets.

Generating natural language descriptions for images is a challenging task. The traditional way is to use the convolutional neural network (CNN) to extract image features, followed by recurrent neural network (RNN) to generate sentences. In this paper, we present a new model that added memory cells to gate the feeding of image features to the deep neural network. The intuition is enabling our model to memorize how much information from images should be fed at each stage of the RNN. Experiments on Flickr8K and Flickr30K datasets showed that our model outperforms other state-of-the-art models with higher BLEU scores.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes