CLOct 4, 2021

Leveraging Information Bottleneck for Scientific Document Summarization

Jiaxin Ju, Ming Liu, Huan Yee Koh, Yuan Jin, Lan Du, Shirui Pan

arXiv:2110.01280v130.9667 citationsh-index: 70

Originality Incremental advance

AI Analysis

This work addresses summarization for scientific documents, but it is incremental as it builds on existing Information Bottleneck methods for sentence compression.

The paper tackles unsupervised extractive summarization of scientific long documents by extending the Information Bottleneck principle to document level, using signals as queries and a pre-trained language model for sentence search and edit. It shows effectiveness on three datasets, with human evaluation indicating better coverage of content aspects than previous systems.

This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.

View on arXiv PDF

Similar