CLCVFeb 15, 2024

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

arXiv:2402.09801v328 citationsh-index: 12Has CodeEMNLP
AI Analysis

This addresses the problem of costly and resource-intensive hallucination mitigation for users of multimodal large language models, offering a more efficient solution.

The paper tackles object hallucination in multimodal large language models by proposing an efficient fine-grained unlearning framework (EFUF) that eliminates hallucinations without requiring paired data, reducing hallucinations while preserving generation quality with modest computational overhead.

Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes