ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
This addresses the challenge of static benchmarks for hallucinations in MLLMs, which is important for improving model reliability, though it is incremental as it builds on existing evaluation methods.
The paper tackles the problem of evaluating hallucinations in multimodal large language models (MLLMs) by proposing ODE, an open-set dynamic protocol that generates varied samples to assess object hallucinations at existence and attribute levels, revealing higher hallucination rates and potential data contamination in MLLMs.
Hallucination poses a persistent challenge for multimodal large language models (MLLMs). However, existing benchmarks for evaluating hallucinations are generally static, which may overlook the potential risk of data contamination. To address this issue, we propose ODE, an open-set, dynamic protocol designed to evaluate object hallucinations in MLLMs at both the existence and attribute levels. ODE employs a graph-based structure to represent real-world object concepts, their attributes, and the distributional associations between them. This structure facilitates the extraction of concept combinations based on diverse distributional criteria, generating varied samples for structured queries that evaluate hallucinations in both generative and discriminative tasks. Through the generation of new samples, dynamic concept combinations, and varied distribution frequencies, ODE mitigates the risk of data contamination and broadens the scope of evaluation. This protocol is applicable to both general and specialized scenarios, including those with limited data. Experimental results demonstrate the effectiveness of our protocol, revealing that MLLMs exhibit higher hallucination rates when evaluated with ODE-generated samples, which indicates potential data contamination. Furthermore, these generated samples aid in analyzing hallucination patterns and fine-tuning models, offering an effective approach to mitigating hallucinations in MLLMs.