CLOct 31, 2020

A Dataset for Tracking Entities in Open Domain Procedural Text

Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy

arXiv:2011.08092v131.31001 citations

Originality Incremental advance

AI Analysis

This addresses the limitation of previous methods that used small, pre-defined attribute sets, enabling more accurate tracking of entities in open-domain procedural text for applications like AI assistants.

The authors tackled the problem of tracking state changes in procedural text from arbitrary domains by introducing the first dataset with an unrestricted vocabulary, resulting in a high-quality dataset of 29,928 state changes with 91.5% coverage and a state-of-the-art model achieving 16.1% F1.

We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step,where the entity, attribute, and state values must be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI1, a high-quality (91.5% coverage as judged by humans and completely vetted), and large-scale dataset comprising 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. A current state-of-the-art generation model on this task achieves 16.1% F1 based on BLEU metric, leaving enough room for novel model architectures.

View on arXiv PDF

Similar