CL AIFeb 17, 2023

Natural Response Generation for Chinese Reading Comprehension

Nuo Chen, Hongguang Li, Yinan Bao, Baoyuan Wang, Jia Li

arXiv:2302.08817v221.1131 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of generating natural responses in Chinese reading comprehension for conversation agents, representing an incremental advancement with a new dataset and method.

The authors tackled the limitation of current machine reading comprehension benchmarks by constructing Penguin, a new dataset with 200k training examples for natural response generation in Chinese, and developed Prompt-BART, a method that improved performance in generating human-like responses.

Machine reading comprehension (MRC) is an important area of conversation agents and draws a lot of attention. However, there is a notable limitation to current MRC benchmarks: The labeled answers are mostly either spans extracted from the target corpus or the choices of the given candidates, ignoring the natural aspect of high-quality responses. As a result, MRC models trained on these datasets can not generate human-like responses in real QA scenarios. To this end, we construct a new dataset called Penguin to promote the research of MRC, providing a training and test bed for natural response generation to real scenarios. Concretely, Penguin consists of 200k training data with high-quality fluent, and well-informed responses. Penguin is the first benchmark towards natural response generation in Chinese MRC on a relatively large scale. To address the challenges in Penguin, we develop two strong baselines: end-to-end and two-stage frameworks. Following that, we further design Prompt-BART: fine-tuning the pre-trained generative language models with a mixture of prefix prompts in Penguin. Extensive experiments validated the effectiveness of this design.

View on arXiv PDF Code

Similar