Unifying Discourse Resources with Dependency Framework
This work tackles the problem of limited labeled data for text-level discourse analysis, which is a bottleneck for researchers working with Chinese text.
This paper addresses the scarcity of labeled data for text-level discourse analysis by unifying multiple Chinese discourse corpora under different annotation schemes into a discourse dependency framework. The authors design semi-automatic methods for conversion and implement benchmark dependency parsers to leverage this unified data.
For text-level discourse analysis, there are various discourse schemes but relatively few labeled data, because discourse research is still immature and it is labor-intensive to annotate the inner logic of a text. In this paper, we attempt to unify multiple Chinese discourse corpora under different annotation schemes with discourse dependency framework by designing semi-automatic methods to convert them into dependency structures. We also implement several benchmark dependency parsers and research on how they can leverage the unified data to improve performance.