CVNov 27, 2024

OPCap:Object-aware Prompting Captioning

arXiv:2412.00095v21 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the problem of inaccurate captions for users of image captioning systems, but it is incremental as it builds on existing object detection and attribute prediction methods.

The paper tackled object hallucination in image captioning by proposing a target-aware prompting strategy that integrates object labels, spatial information, and refined semantic features into the decoder, resulting in effective mitigation of hallucination and significant improvement in caption quality on COCO and nocaps datasets.

In the field of image captioning, the phenomenon where missing or nonexistent objects are used to explain an image is referred to as object bias (or hallucination). To mitigate this issue, we propose a target-aware prompting strategy. This method first extracts object labels and their spatial information from the image using an object detector. Then, an attribute predictor further refines the semantic features of the objects. These refined features are subsequently integrated and fed into the decoder, enhancing the model's understanding of the image context. Experimental results on the COCO and nocaps datasets demonstrate that OPCap effectively mitigates hallucination and significantly improves the quality of generated captions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes