A Pilot Study on Dialogue-Level Dependency Parsing for Chinese
This addresses a gap in Chinese dialogue parsing, which is incremental as it builds on existing syntactic treebanks and methods.
The paper tackles dialogue-level dependency parsing for Chinese by creating a human-annotated corpus of 850 dialogues with 199,803 dependencies and developing methods for zero-shot and few-shot scenarios using signal-based transformation and data selection, showing effective baseline results.
Dialogue-level dependency parsing has received insufficient attention, especially for Chinese. To this end, we draw on ideas from syntactic dependency and rhetorical structure theory (RST), developing a high-quality human-annotated corpus, which contains 850 dialogues and 199,803 dependencies. Considering that such tasks suffer from high annotation costs, we investigate zero-shot and few-shot scenarios. Based on an existing syntactic treebank, we adopt a signal-based method to transform seen syntactic dependencies into unseen ones between elementary discourse units (EDUs), where the signals are detected by masked language modeling. Besides, we apply single-view and multi-view data selection to access reliable pseudo-labeled instances. Experimental results show the effectiveness of these baselines. Moreover, we discuss several crucial points about our dataset and approach.