CLApr 16, 2021

ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning

Rujun Han, I-Hung Hsu, Jiao Sun, Julia Baylon, Qiang Ning, Dan Roth, Nanyun Peng

arXiv:2104.08350v24.352 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark for evaluating machines' ability to understand event semantic relations, addressing a gap in existing datasets that focus on arguments or temporal relations.

The authors introduced ESTER, a dataset for machine reading comprehension focused on reasoning about five common event semantic relations, containing over 6K questions and 10.1K event relation pairs. Experimental results show state-of-the-art systems achieve scores significantly below human performance, with exact-match at 22.1% versus human 36.0%, highlighting its challenge.

Understanding how events are semantically related to each other is the essence of reading comprehension. Recent event-centric reading comprehension datasets focus mostly on event arguments or temporal relations. While these tasks partially evaluate machines' ability of narrative understanding, human-like reading comprehension requires the capability to process event-based information beyond arguments and temporal reasoning. For example, to understand causality between events, we need to infer motivation or purpose; to establish event hierarchy, we need to understand the composition of events. To facilitate these tasks, we introduce ESTER, a comprehensive machine reading comprehension (MRC) dataset for Event Semantic Relation Reasoning. The dataset leverages natural language queries to reason about the five most common event semantic relations, provides more than 6K questions and captures 10.1K event relation pairs. Experimental results show that the current SOTA systems achieve 22.1%, 63.3%, and 83.5% for token-based exact-match, F1, and event-based HIT@1 scores, which are all significantly below human performances (36.0%, 79.6%, 100% respectively), highlighting our dataset as a challenging benchmark.

View on arXiv PDF Code

Similar