CLMay 15, 2018

A Manually Annotated Chinese Corpus for Non-task-oriented Dialogue Systems

Jing Li, Yan Song, Haisong Zhang, Shuming Shi

arXiv:1805.05542v10.73 citations

Originality Synthesis-oriented

AI Analysis

This provides a valuable resource for researchers and developers working on Chinese non-task-oriented dialogue systems, though it is incremental as it builds on existing data collection and annotation efforts.

The authors tackled the lack of a large-scale annotated dataset for non-task-oriented dialogue systems by creating a corpus with over 27K prompts and 82K responses from social media, annotated with a 5-grade rating scheme, and experimental results confirmed its usefulness in training response selection models.

This paper presents a large-scale corpus for non-task-oriented dialogue response selection, which contains over 27K distinct prompts more than 82K responses collected from social media. To annotate this corpus, we define a 5-grade rating scheme: bad, mediocre, acceptable, good, and excellent, according to the relevance, coherence, informativeness, interestingness, and the potential to move a conversation forward. To test the validity and usefulness of the produced corpus, we compare various unsupervised and supervised models for response selection. Experimental results confirm that the proposed corpus is helpful in training response selection models.

View on arXiv PDF

Similar