CLAug 25, 2017

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

arXiv:1708.07690v139.31095 citations

Originality Incremental advance

AI Analysis

This work provides a strong baseline for multi-document summarization, though it is incremental as it builds on an existing centroid-based model.

The authors tackled multi-document summarization by adapting a centroid-based method to rank summaries instead of sentences, achieving performance on par with complex state-of-the-art methods on the DUC2004 dataset.

The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possi- bilities to scale up to larger input docu- ment collections by selecting a small num- ber of sentences from each document prior to constructing the summary. Experiments were done on the DUC2004 dataset for multi-document summarization. We ob- serve a higher performance over the orig- inal model, on par with more complex state-of-the-art methods.

View on arXiv PDF

Similar