CVCLOct 31, 2019

Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

arXiv:1910.14208v2
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in image captioning for AI applications, offering an incremental improvement through better hidden state training.

The paper tackles the problem of noisy gradient signals in RNN-based image captioning models by proposing Hidden State Guidance (HSG), a framework that matches hidden states to a teacher decoder trained on autoencoding captions, resulting in improved performance over state-of-the-art decoders.

Most RNN-based image captioning models receive supervision on the output words to mimic human captions. Therefore, the hidden states can only receive noisy gradient signals via layers of back-propagation through time, leading to less accurate generated captions. Consequently, we propose a novel framework, Hidden State Guidance (HSG), that matches the hidden states in the caption decoder to those in a teacher decoder trained on an easier task of autoencoding the captions conditioned on the image. During training with the REINFORCE algorithm, the conventional rewards are sentence-based evaluation metrics equally distributed to each generated word, no matter their relevance. HSG provides a word-level reward that helps the model learn better hidden representations. Experimental results demonstrate that HSG clearly outperforms various state-of-the-art caption decoders using either raw images or detected objects as inputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes