Dataset for the First Evaluation on Chinese Machine Reading Comprehension
This provides a valuable resource for researchers working on Chinese NLP, though it is incremental as it extends existing MRC datasets to a new language.
The authors introduced a new Chinese machine reading comprehension dataset to address the lack of non-English resources, including cloze-style and user query types with large-scale training data and human-annotated validation and test sets, and hosted the first CMRC-2017 evaluation that attracted tens of participants.
Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two different types: cloze-style reading comprehension and user query reading comprehension, associated with large-scale training data as well as human-annotated validation and hidden test set. Along with this dataset, we also hosted the first Evaluation on Chinese Machine Reading Comprehension (CMRC-2017) and successfully attracted tens of participants, which suggest the potential impact of this dataset.