Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT
This work aims to improve the quality of extractive summarization for NLP researchers and applications by moving beyond sentence-level extraction.
This paper addresses the gap between human-written summaries and oracle sentence labels in extractive summarization by proposing to extract fact-level semantic units. The model incorporates a hierarchical structure and uses a hierarchical graph mask with BERT, achieving state-of-the-art results on the CNN/DailyMail dataset.
Most current extractive summarization models generate summaries by selecting salient sentences. However, one of the problems with sentence-level extractive summarization is that there exists a gap between the human-written gold summary and the oracle sentence labels. In this paper, we propose to extract fact-level semantic units for better extractive summarization. We also introduce a hierarchical structure, which incorporates the multi-level of granularities of the textual information into the model. In addition, we incorporate our model with BERT using a hierarchical graph mask. This allows us to combine BERT's ability in natural language understanding and the structural information without increasing the scale of the model. Experiments on the CNN/DaliyMail dataset show that our model achieves state-of-the-art results.