CL AISep 10, 2019

WIQA: A dataset for "What if..." reasoning over procedural text

Niket Tandon, Bhavana Dalvi Mishra, Keisuke Sakaguchi, Antoine Bosselut, Peter Clark

arXiv:1909.04739v130.71021 citations

Originality Incremental advance

AI Analysis

This dataset addresses the problem of evaluating and improving AI models' causal reasoning abilities over procedural text, presenting an open challenge to the research community.

The authors introduced WIQA, the first large-scale dataset for 'What if...' reasoning over procedural text, containing 40k multiple-choice questions derived from influence graphs, and found that state-of-the-art models achieve 73.8% accuracy compared to human performance of 96.3%.

We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text. WIQA contains three parts: a collection of paragraphs each describing a process, e.g., beach erosion; a set of crowdsourced influence graphs for each paragraph, describing how one change affects another; and a large (40k) collection of "What if...?" multiple-choice questions derived from the graphs. For example, given a paragraph about beach erosion, would stormy weather result in more or less erosion (or have no effect)? The task is to answer the questions, given their associated paragraph. WIQA contains three kinds of questions: perturbations to steps mentioned in the paragraph; external (out-of-paragraph) perturbations requiring commonsense knowledge; and irrelevant (no effect) perturbations. We find that state-of-the-art models achieve 73.8% accuracy, well below the human performance of 96.3%. We analyze the challenges, in particular tracking chains of influences, and present the dataset as an open challenge to the community.

View on arXiv PDF

Similar