CVAIAug 9, 2025

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance

arXiv:2508.06853v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of improving caption relevance for image captioning applications, representing an incremental advancement.

The paper tackles the challenge of generating accurate and descriptive image captions by proposing AGIC, which amplifies salient visual regions to guide caption generation and uses a hybrid decoding strategy for balance; results show it matches or surpasses state-of-the-art models on Flickr8k and Flickr30k datasets with faster inference.

Despite significant progress in image captioning, generating accurate and descriptive captions remains a long-standing challenge. In this study, we propose Attention-Guided Image Captioning (AGIC), which amplifies salient visual regions directly in the feature space to guide caption generation. We further introduce a hybrid decoding strategy that combines deterministic and probabilistic sampling to balance fluency and diversity. To evaluate AGIC, we conduct extensive experiments on the Flickr8k and Flickr30k datasets. The results show that AGIC matches or surpasses several state-of-the-art models while achieving faster inference. Moreover, AGIC demonstrates strong performance across multiple evaluation metrics, offering a scalable and interpretable solution for image captioning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes