CLAIOct 25, 2024

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

arXiv:2410.21315v112 citationsh-index: 5Has CodeNAACL
Originality Incremental advance
AI Analysis

This work addresses the need for more intuitive and efficient graph-based summarization models for long documents, though it is incremental as it builds on existing heterogeneous graph neural network approaches.

The paper tackles the problem of long document extractive summarization by proposing GraphLSS, a heterogeneous graph construction that integrates lexical, structural, and semantic features without external tools, and it shows competitive performance with top graph-based methods on benchmark datasets.

Heterogeneous graph neural networks have recently gained attention for long document summarization, modeling the extraction as a node classification task. Although effective, these models often require external tools or additional machine learning models to define graph components, producing highly complex and less intuitive structures. We present GraphLSS, a heterogeneous graph construction for long document extractive summarization, incorporating Lexical, Structural, and Semantic features. It defines two levels of information (words and sentences) and four types of edges (sentence semantic similarity, sentence occurrence order, word in sentence, and word semantic similarity) without any need for auxiliary learning models. Experiments on two benchmark datasets show that GraphLSS is competitive with top-performing graph-based methods, outperforming recent non-graph models. We release our code on GitHub.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes