CLCVLGFeb 25, 2021

Retrieval Augmentation for Deep Neural Networks

arXiv:2102.13030v2Has Code
AI Analysis

This work addresses a general inefficiency in neural network training for vision and language tasks, though it appears incremental as it builds on existing retrieval and attention mechanisms.

The paper tackles the problem of deep neural networks discarding training data during predictions by proposing a retrieval augmentation approach that uses nearest training examples to aid predictions, achieving effective results on image captioning and sentiment analysis tasks using Flickr8 and IMDB datasets.

Deep neural networks have achieved state-of-the-art results in various vision and/or language tasks. Despite the use of large training datasets, most models are trained by iterating over single input-output pairs, discarding the remaining examples for the current prediction. In this work, we actively exploit the training data, using the information from nearest training examples to aid the prediction both during training and testing. Specifically, our approach uses the target of the most similar training example to initialize the memory state of an LSTM model, or to guide attention mechanisms. We apply this approach to image captioning and sentiment analysis, respectively through image and text retrieval. Results confirm the effectiveness of the proposed approach for the two tasks, on the widely used Flickr8 and IMDB datasets. Our code is publicly available at http://github.com/RitaRamo/retrieval-augmentation-nn.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes