CLApr 14, 2021

Predicting Discourse Trees from Transformer-based Neural Summarizers

arXiv:2104.07058v1729 citations
Originality Incremental advance
AI Analysis

This work addresses the bidirectional synergy between discourse and summarization for NLP researchers, but it is incremental as it builds on prior knowledge of discourse benefits in summarization.

The paper investigates whether pre-trained neural summarizers can infer document-level discourse trees, finding that they learn both dependency- and constituency-style discourse information, typically encoded in a single attention head, with results showing general and transferable inter-domain discourse learning.

Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summarizers. In particular, we generate unlabeled RST-style discourse trees from the self-attention matrices of the transformer model. Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies. Overall, the experimental results suggest that the learned discourse information is general and transferable inter-domain.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes