CLAIMar 23, 2024

Modeling Unified Semantic Discourse Structure for High-quality Headline Generation

arXiv:2403.15776v1h-index: 37
Originality Incremental advance
AI Analysis

This work addresses the challenge of summarizing lengthy documents into catchy headlines, which is incremental as it builds on existing discourse and semantic representation techniques.

The authors tackled the problem of generating high-quality headlines by modeling a unified semantic discourse structure (S3) to capture core document semantics, and their method outperformed state-of-the-art approaches on two datasets.

Headline generation aims to summarize a long document with a short, catchy title that reflects the main idea. This requires accurately capturing the core document semantics, which is challenging due to the lengthy and background information-rich na ture of the texts. In this work, We propose using a unified semantic discourse structure (S3) to represent document semantics, achieved by combining document-level rhetorical structure theory (RST) trees with sentence-level abstract meaning representation (AMR) graphs to construct S3 graphs. The hierarchical composition of sentence, clause, and word intrinsically characterizes the semantic meaning of the overall document. We then develop a headline generation framework, in which the S3 graphs are encoded as contextual features. To consolidate the efficacy of S3 graphs, we further devise a hierarchical structure pruning mechanism to dynamically screen the redundant and nonessential nodes within the graph. Experimental results on two headline generation datasets demonstrate that our method outperforms existing state-of-art methods consistently. Our work can be instructive for a broad range of document modeling tasks, more than headline or summarization generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes