CVNov 27, 2023

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

arXiv:2311.15879v255 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the problem of handling novel objects in image captioning for AI applications, offering an incremental improvement by efficiently updating knowledge without retraining.

The paper tackles the challenge of open-world image captioning by introducing EVCap, a retrieval-augmented method that uses an external visual-name memory to update object knowledge without extensive data or scaling, achieving competitive performance with only 3.97M trainable parameters.

Large language models (LLMs)-based image captioning has the capability of describing objects not explicitly observed in training data; yet novel objects occur frequently, necessitating the requirement of sustaining up-to-date object knowledge for open-world comprehension. Instead of relying on large amounts of data and/or scaling up network parameters, we introduce a highly effective retrieval-augmented image captioning method that prompts LLMs with object names retrieved from External Visual--name memory (EVCap). We build ever-changing object knowledge memory using objects' visuals and names, enabling us to (i) update the memory at a minimal cost and (ii) effortlessly augment LLMs with retrieved object names by utilizing a lightweight and fast-to-train model. Our model, which was trained only on the COCO dataset, can adapt to out-of-domain without requiring additional fine-tuning or re-training. Our experiments conducted on benchmarks and synthetic commonsense-violating data show that EVCap, with only 3.97M trainable parameters, exhibits superior performance compared to other methods based on frozen pre-trained LLMs. Its performance is also competitive to specialist SOTAs that require extensive training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes