CLCVLGOct 10, 2023

What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models

arXiv:2310.06627v47 citationsh-index: 17
Originality Synthesis-oriented
AI Analysis

This addresses the need for better benchmarks to assess counterfactual reasoning in AI models, though it is incremental as it focuses on dataset creation rather than model improvement.

The authors tackled the problem of evaluating counterfactual reasoning in multi-modal language models by introducing the C-VQA dataset, which caused performance drops of up to 40% in current models, highlighting a gap compared to human-like abilities.

Counterfactual reasoning, a fundamental aspect of human cognition, involves contemplating alternatives to established facts or past events, significantly enhancing our abilities in planning and decision-making. In light of the advancements in current multi-modal large language models, we explore their effectiveness in counterfactual reasoning. To facilitate this investigation, we introduce a novel dataset, C-VQA, specifically designed to test the counterfactual reasoning capabilities of modern multi-modal large language models. This dataset is constructed by infusing original questions with counterfactual presuppositions, spanning various types such as numerical and boolean queries. It encompasses a mix of real and synthetic data, representing a wide range of difficulty levels. Our thorough evaluations of contemporary vision-language models using this dataset have revealed substantial performance drops, with some models showing up to a 40% decrease, highlighting a significant gap between current models and human-like vision reasoning capabilities. We hope our dataset will serve as a vital benchmark for evaluating the counterfactual reasoning capabilities of models. Code and dataset are publicly available at https://bzhao.me/C-VQA/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes