CLSep 25, 2017

Dataset for the First Evaluation on Chinese Machine Reading Comprehension

arXiv:1709.08299v21099 citations
AI Analysis

This provides a valuable resource for researchers working on Chinese NLP, though it is incremental as it extends existing MRC datasets to a new language.

The authors introduced a new Chinese machine reading comprehension dataset to address the lack of non-English resources, including cloze-style and user query types with large-scale training data and human-annotated validation and test sets, and hosted the first CMRC-2017 evaluation that attracted tens of participants.

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two different types: cloze-style reading comprehension and user query reading comprehension, associated with large-scale training data as well as human-annotated validation and hidden test set. Along with this dataset, we also hosted the first Evaluation on Chinese Machine Reading Comprehension (CMRC-2017) and successfully attracted tens of participants, which suggest the potential impact of this dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes