CLJun 25, 2023

SciMRC: Multi-perspective Scientific Machine Reading Comprehension

Xiao Zhang, Heqi Zheng, Yuxiang Nie, Heyan Huang, Xian-Ling Mao

arXiv:2306.14149v116.781 citationsh-index: 32

Originality Synthesis-oriented

AI Analysis

This addresses the problem of limited perspective diversity in SMRC datasets for researchers, though it is incremental as it builds on existing data collection efforts.

The authors tackled the lack of multi-perspective datasets in scientific machine reading comprehension by creating SciMRC, a dataset with 6,057 question-answer pairs from 741 papers across beginner, student, and expert perspectives, which experiments showed is challenging for models.

Scientific machine reading comprehension (SMRC) aims to understand scientific texts through interactions with humans by given questions. As far as we know, there is only one dataset focused on exploring full-text scientific machine reading comprehension. However, the dataset has ignored the fact that different readers may have different levels of understanding of the text, and only includes single-perspective question-answer pairs, leading to a lack of consideration of different perspectives. To tackle the above problem, we propose a novel multi-perspective SMRC dataset, called SciMRC, which includes perspectives from beginners, students and experts. Our proposed SciMRC is constructed from 741 scientific papers and 6,057 question-answer pairs. Each perspective of beginners, students and experts contains 3,306, 1,800 and 951 QA pairs, respectively. The extensive experiments on SciMRC by utilizing pre-trained models suggest the importance of considering perspectives of SMRC, and demonstrate its challenging nature for machine comprehension.

View on arXiv PDF

Similar