CLApr 6, 2020

At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization

arXiv:2004.02664v231.0994 citations

Originality Incremental advance

AI Analysis

This work addresses redundancy issues in extractive summarization for NLP researchers, but it is incremental as it builds on existing methods with a novel unit extraction approach.

The paper tackles the problem of extractive document summarization by investigating whether extracting sub-sentential units instead of full sentences reduces redundancy and improves performance, finding that this approach performs competitively in automatic and human evaluations.

Extractive methods have been proven effective in automatic document summarization. Previous works perform this task by identifying informative contents at sentence level. However, it is unclear whether performing extraction at sentence level is the best solution. In this work, we show that unnecessity and redundancy issues exist when extracting full sentences, and extracting sub-sentential units is a promising alternative. Specifically, we propose extracting sub-sentential units based on the constituency parsing tree. A neural extractive model which leverages the sub-sentential information and extracts them is presented. Extensive experiments and analyses show that extracting sub-sentential units performs competitively comparing to full sentence extraction under the evaluation of both automatic and human evaluations. Hopefully, our work could provide some inspiration of the basic extraction units in extractive summarization for future research.

View on arXiv PDF

Similar