CVAICLIVMar 20, 2024

Inserting Faces inside Captions: Image Captioning with Attention Guided Merging

arXiv:2405.02305v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the specific problem of name retrieval bias in image captioning for improving accessibility and retrieval of pictures, representing an incremental advancement with a new dataset and method.

The paper tackles the problem of image captioning models being inefficient and biased at retrieving people's names by introducing AstroCaptions, a dataset with thousands of public figures, and a novel post-processing method to insert identified names into captions using explainable AI and vision-language models. The method improves caption quality, reduces hallucinations, and achieves up to 93.2% insertion of detected persons, with gains in BLEU, ROUGE, CIDEr, and METEOR scores.

Image captioning models are widely used to describe recent and archived pictures with the objective of improving their accessibility and retrieval. Yet, these approaches tend to be inefficient and biased at retrieving people's names. In this work we introduce AstroCaptions, a dataset for the image captioning task. This dataset specifically contains thousands of public fig-ures that are complex to identify for a traditional model. We also propose a novel post-processing method to insert identified people's names inside the caption using explainable AI tools and the grounding capabilities of vi-sion-language models. The results obtained with this method show signifi-cant improvements of captions quality and a potential of reducing halluci-nations. Up to 93.2% of the persons detected can be inserted in the image captions leading to improvements in the BLEU, ROUGE, CIDEr and METEOR scores of each captioning model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes