CLLGNEJan 1, 2019

Transfer learning from language models to image caption generators: Better models may not transfer better

arXiv:1901.01216v12 citations
Originality Incremental advance
AI Analysis

This addresses the problem of optimizing transfer learning for image captioning, showing that model quality metrics like perplexity do not directly translate to downstream task performance, which is incremental for researchers in multimodal AI.

The study investigated whether transferring parameters from a neural language model to an image caption generator improves performance, finding that such transfers enhance results compared to training from scratch, but the best language models do not yield the best caption generators.

When designing a neural caption generator, a convolutional neural network can be used to extract image features. Is it possible to also use a neural language model to extract sentence prefix features? We answer this question by trying different ways to transfer the recurrent neural network and embedding layer from a neural language model to an image caption generator. We find that image caption generators with transferred parameters perform better than those trained from scratch, even when simply pre-training them on the text of the same captions dataset it will later be trained on. We also find that the best language models (in terms of perplexity) do not result in the best caption generators after transfer learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes