CVJun 20, 2024

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment

Yusuke Hirota, Ryo Hachiuma, Chao-Han Huck Yang, Yuta Nakashima

arXiv:2406.13912v122.126 citations

Originality Incremental advance

AI Analysis

This research highlights a critical problem for AI fairness and reliability by revealing that making image captions more descriptive can amplify biases and errors, cautioning against this trend.

The study investigated the negative side effects of generative caption enrichment (GCE) in vision-language models, finding that it increases gender bias by 30.9% and hallucination by 59.5% in enriched captions and models trained on them.

Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text. This generative approach to image caption enrichment further makes textual captions more descriptive, improving alignment with the visual context. However, while many studies focus on benefits of generative caption enrichment (GCE), are there any negative side effects? We compare standard-format captions and recent GCE processes from the perspectives of "gender bias" and "hallucination", showing that enriched captions suffer from increased gender bias and hallucination. Furthermore, models trained on these enriched captions amplify gender bias by an average of 30.9% and increase hallucination by 59.5%. This study serves as a caution against the trend of making captions more descriptive.

View on arXiv PDF

Similar