Unsupervised Inference of Data-Driven Discourse Structures using a Tree Auto-Encoder
This addresses the need for robust discourse structures in NLP tasks, though it appears incremental as it extends existing latent tree induction frameworks.
The paper tackles the problem of lacking high-quality discourse trees for downstream applications by proposing an unsupervised method to generate tree structures using a tree auto-encoder, which can create larger and more diverse discourse treebanks to complement existing models.
With a growing need for robust and general discourse structures in many downstream tasks and real-world applications, the current lack of high-quality, high-quantity discourse trees poses a severe shortcoming. In order the alleviate this limitation, we propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective. The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others. However, due to the especially difficult annotation process to generate discourse trees, we initially develop such method to complement task-specific models in generating much larger and more diverse discourse treebanks.