CV AIMay 30, 2025

BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models

Huu-Thien Tran, Thanh-Dat Truong, Khoa Luu

arXiv:2505.24649v16.21 citationsh-index: 162025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Originality Incremental advance

AI Analysis

This addresses the issue of unreliable outputs in vision-language models, which is critical for building trustworthy AI systems, though it appears incremental as it builds on existing mitigation approaches.

The paper tackles the hallucination problem in large vision-language models by proposing the BIMA method, which achieves an average F1 score of 85.06% on the POPE benchmark and reduces CHAIRS and CHAIRI by 7.6% and 2.6%, respectively.

Large vision-language models have become widely adopted to advance in various domains. However, developing a trustworthy system with minimal interpretable characteristics of large-scale models presents a significant challenge. One of the most prevalent terms associated with the fallacy functions caused by these systems is hallucination, where the language model generates a response that does not correspond to the visual content. To mitigate this problem, several approaches have been developed, and one prominent direction is to ameliorate the decoding process. In this paper, we propose a new Bijective Maximum Likelihood Learning (BIMA) approach to hallucination mitigation using normalizing flow theories. The proposed BIMA method can efficiently mitigate the hallucination problem in prevailing vision-language models, resulting in significant improvements. Notably, BIMA achieves the average F1 score of 85.06% on POPE benchmark and remarkably reduce CHAIRS and CHAIRI by 7.6% and 2.6%, respectively. To the best of our knowledge, this is one of the first studies that contemplates the bijection means to reduce hallucination induced by large vision-language models.

View on arXiv PDF

Similar