CLSep 16, 2019

KorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension

Seungyoung Lim, Myungji Kim, Jooyoul Lee

arXiv:1909.07005v2103 citations

Originality Synthesis-oriented

AI Analysis

This provides a valuable resource for researchers and developers working on Korean language processing, though it is incremental as it adapts an existing dataset format to a new language.

The authors introduced KorQuAD1.0, a large-scale Korean dataset with over 70,000 human-generated question-answer pairs for extractive machine reading comprehension, aimed at advancing multilingual NLP research.

Machine Reading Comprehension (MRC) is a task that requires machine to understand natural language and answer questions by reading a document. It is the core of automatic response technology such as chatbots and automatized customer supporting systems. We present Korean Question Answering Dataset(KorQuAD), a large-scale Korean dataset for extractive machine reading comprehension task. It consists of 70,000+ human generated question-answer pairs on Korean Wikipedia articles. We release KorQuAD1.0 and launch a challenge at https://KorQuAD.github.io to encourage the development of multilingual natural language processing research.

View on arXiv PDF

Similar