CVAIMar 30, 2018

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

arXiv:1803.11439v297 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses regularization in RNNs for caption generation, offering incremental improvements for applications like image and code captioning.

The paper tackles the problem of regularizing recurrent neural networks (RNNs) for caption generation by proposing the Auto-Reconstructor Network (ARNet), which reconstructs previous hidden states to embed more information and improve transition dynamics, resulting in boosted performance on image and source code captioning tasks and reduced training-inference discrepancy.

Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on. In this paper, we propose a novel architecture, namely Auto-Reconstructor Network (ARNet), which, coupling with the conventional encoder-decoder framework, works in an end-to-end fashion to generate captions. ARNet aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. Therefore, ARNet encourages the current hidden state to embed more information from the previous one, which can help regularize the transition dynamics of recurrent neural networks (RNNs). Extensive experimental results show that our proposed ARNet boosts the performance over the existing encoder-decoder models on both image captioning and source code captioning tasks. Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation. Furthermore, the performance on permuted sequential MNIST demonstrates that ARNet can effectively regularize RNN, especially on modeling long-term dependencies. Our code is available at: https://github.com/chenxinpeng/ARNet

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes