R3: A Reading Comprehension Benchmark Requiring Reasoning Processes
This addresses the problem of explainability and overestimation of AI understanding in natural language processing, though it is incremental as it builds on existing reading comprehension benchmarks.
The authors tackled the lack of explicit reasoning in question answering systems by proposing a new reading comprehension task that requires models to provide both final answers and reasoning processes, resulting in the R3 dataset with over 60K question-answer pairs annotated with Text Reasoning Meaning Representations (TRMR).
Existing question answering systems can only predict answers without explicit reasoning processes, which hinder their explainability and make us overestimate their ability of understanding and reasoning over natural language. In this work, we propose a novel task of reading comprehension, in which a model is required to provide final answers and reasoning processes. To this end, we introduce a formalism for reasoning over unstructured text, namely Text Reasoning Meaning Representation (TRMR). TRMR consists of three phrases, which is expressive enough to characterize the reasoning process to answer reading comprehension questions. We develop an annotation platform to facilitate TRMR's annotation, and release the R3 dataset, a \textbf{R}eading comprehension benchmark \textbf{R}equiring \textbf{R}easoning processes. R3 contains over 60K pairs of question-answer pairs and their TRMRs. Our dataset is available at: \url{http://anonymous}.