CLDec 29, 2019

ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension

Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner

arXiv:1912.12598v13.319 citations

Originality Synthesis-oriented

AI Analysis

This provides a comprehensive and unrestricted test bed for researchers working on machine reading comprehension, facilitating evaluation of models' capabilities across various reading phenomena.

The authors tackled the problem of tedious and time-consuming evaluation of machine reading comprehension models across diverse datasets by introducing ORB, an open evaluation server that reports performance on seven diverse datasets and includes synthetic augmentations for out-of-domain testing.

Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context. Given the availability of many such datasets, comprehensive and reliable evaluation is tedious and time-consuming for researchers working on this problem. We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model's capability in understanding a wide variety of reading phenomena. The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning for general reading facility. As more suitable datasets are released, they will be added to the evaluation server. We also collect and include synthetic augmentations for these datasets, testing how well models can handle out-of-domain questions.

View on arXiv PDF

Similar