CVAIApr 19, 2024

Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning

arXiv:2404.12966v512 citationsh-index: 26Has CodeMM
Originality Incremental advance
AI Analysis

This addresses a critical limitation in MLLMs' reasoning for AI applications, though it is incremental as it builds on existing reinforcement learning paradigms.

The paper tackles the problem of whether Multimodal Large Language Models (MLLMs) possess human-like compositional reasoning abilities by introducing MARS-Bench, a benchmark for assumptive reasoning, and finds that most MLLMs are easily fooled by presuppositions. It proposes Active Deduction (AD), a reinforcement learning method that significantly improves MLLMs' assumptive reasoning abilities without compromising general performance.

Recently, Multimodal Large Language Models (MLLMs) have achieved significant success across multiple disciplines due to their exceptional instruction-following capabilities and extensive world knowledge. However, whether these MLLMs possess human-like compositional reasoning abilities remains an open problem. To unveil their reasoning behaviors, we first curate a \textbf{M}ultimodal \textbf{A}ssumptive \textbf{R}ea\textbf{s}oning Benchmark (MARS-Bench) in this paper. Interestingly, we find that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question, whereas such presuppositions appear naive to human reasoning. Besides, we also propose a simple yet effective method, Active Deduction (AD), a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction before reaching a final decision. Equipped with the proposed AD method, a MLLM demonstrates significant improvements in assumptive reasoning abilities without compromising its general-purpose question-answering performance. We also provide extensive evaluations of both open-source and private MLLMs on MARS-Bench, along with experimental analyses of the AD method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes