An Annotation Scheme of A Large-scale Multi-party Dialogues Dataset for Discourse Parsing and Machine Comprehension
This work addresses the problem of understanding complex multi-party dialogues for researchers in natural language processing, though it is incremental as it builds on existing corpus data.
The authors tackled the lack of large-scale annotated datasets for multi-party dialogues by proposing an annotation scheme based on the Ubuntu Chat Corpus, resulting in the first such corpus for discourse parsing and machine reading comprehension tasks.
In this paper, we propose the scheme for annotating large-scale multi-party chat dialogues for discourse parsing and machine comprehension. The main goal of this project is to help understand multi-party dialogues. Our dataset is based on the Ubuntu Chat Corpus. For each multi-party dialogue, we annotate the discourse structure and question-answer pairs for dialogues. As we know, this is the first large scale corpus for multi-party dialogues discourse parsing, and we firstly propose the task for multi-party dialogues machine reading comprehension.