CLAIOct 29, 2023

Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders

arXiv:2310.18992v1131 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the problem of improving unsupervised extractive summarization for natural language processing applications, representing an incremental advancement by refining pre-training methods for better sentence representations.

The paper tackles the gap between pre-training and sentence-ranking in unsupervised extractive summarization by proposing a graph pre-training auto-encoder that models intra-sentential distinctive and inter-sentential cohesive features using sentence-word bipartite graphs, resulting in sentence representations that outperform BERT- or RoBERTa-based methods in downstream tasks.

Pre-trained sentence representations are crucial for identifying significant sentences in unsupervised document extractive summarization. However, the traditional two-step paradigm of pre-training and sentence-ranking, creates a gap due to differing optimization objectives. To address this issue, we argue that utilizing pre-trained embeddings derived from a process specifically designed to optimize cohensive and distinctive sentence representations helps rank significant sentences. To do so, we propose a novel graph pre-training auto-encoder to obtain sentence embeddings by explicitly modelling intra-sentential distinctive features and inter-sentential cohesive features through sentence-word bipartite graphs. These pre-trained sentence representations are then utilized in a graph-based ranking algorithm for unsupervised summarization. Our method produces predominant performance for unsupervised summarization frameworks by providing summary-worthy sentence representations. It surpasses heavy BERT- or RoBERTa-based sentence representations in downstream tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes