CVAIJul 27, 2023

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation

arXiv:2307.14750v32 citationsh-index: 51Has Code
Originality Highly original
AI Analysis

This addresses the challenge of annotation-free image captioning for AI and computer vision applications, representing a novel method for a known bottleneck.

The paper tackles the problem of training image captioners without annotated image-sentence pairs by proposing Retrieval-augmented Pseudo Sentence Generation (RaPSG), which uses large pre-trained models and retrieval to generate high-quality pseudo sentences, resulting in outperforming state-of-the-art models in zero-shot, unsupervised, semi-supervised, and cross-domain scenarios.

Recently, training an image captioner without annotated image-sentence pairs has gained traction. Previous methods have faced limitations due to either using mismatched corpora for inaccurate pseudo annotations or relying on resource-intensive pre-training. To alleviate these challenges, we propose a new strategy where the prior knowledge from large pre-trained models (LPMs) is distilled and leveraged as supervision, and a retrieval process is integrated to further reinforce its effectiveness. Specifically, we introduce Retrieval-augmented Pseudo Sentence Generation (RaPSG), which can efficiently retrieve highly relevant short region descriptions from the mismatching corpora and use them to generate a variety of high-quality pseudo sentences via LPMs. Additionally, we introduce a fluency filter and a CLIP guidance objective to enhance contrastive information learning. Experimental results indicate that our method outperforms SOTA captioning models across various settings including zero-shot, unsupervised, semi-supervised, and cross-domain scenarios. Code is available at: https://github.com/Zhiyuan-Li-John/RaPSG.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes