CLAISep 10, 2019

WIQA: A dataset for "What if..." reasoning over procedural text

arXiv:1909.04739v11021 citations
Originality Incremental advance
AI Analysis

This dataset addresses the problem of evaluating and improving AI models' causal reasoning abilities over procedural text, presenting an open challenge to the research community.

The authors introduced WIQA, the first large-scale dataset for 'What if...' reasoning over procedural text, containing 40k multiple-choice questions derived from influence graphs, and found that state-of-the-art models achieve 73.8% accuracy compared to human performance of 96.3%.

We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text. WIQA contains three parts: a collection of paragraphs each describing a process, e.g., beach erosion; a set of crowdsourced influence graphs for each paragraph, describing how one change affects another; and a large (40k) collection of "What if...?" multiple-choice questions derived from the graphs. For example, given a paragraph about beach erosion, would stormy weather result in more or less erosion (or have no effect)? The task is to answer the questions, given their associated paragraph. WIQA contains three kinds of questions: perturbations to steps mentioned in the paragraph; external (out-of-paragraph) perturbations requiring commonsense knowledge; and irrelevant (no effect) perturbations. We find that state-of-the-art models achieve 73.8% accuracy, well below the human performance of 96.3%. We analyze the challenges, in particular tracking chains of influences, and present the dataset as an open challenge to the community.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes