Reasoning about Goals, Steps, and Temporal Ordering with WikiHow
This work addresses the need for better commonsense inference benchmarks in AI, though it is incremental as it builds on existing datasets and models.
The authors tackled the problem of reasoning about procedural events by introducing a dataset for goal-step and step-step temporal relations based on wikiHow, revealing a 10-20% performance gap between state-of-the-art transformer models and humans on a human-validated test set.
We propose a suite of reasoning tasks on two types of relations between procedural events: goal-step relations ("learn poses" is a step in the larger goal of "doing yoga") and step-step temporal relations ("buy a yoga mat" typically precedes "learn poses"). We introduce a dataset targeting these two relations based on wikiHow, a website of instructional how-to articles. Our human-validated test set serves as a reliable benchmark for commonsense inference, with a gap of about 10% to 20% between the performance of state-of-the-art transformer models and human performance. Our automatically-generated training set allows models to effectively transfer to out-of-domain tasks requiring knowledge of procedural events, with greatly improved performances on SWAG, Snips, and the Story Cloze Test in zero- and few-shot settings.