CL IR LGOct 1, 2019

Global Voices: Crossing Borders in Automatic News Summarization

arXiv:1910.00421v430.1997 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for evaluation data in cross-lingual summarization, but it is incremental as it focuses on dataset creation and metric analysis.

The authors constructed Global Voices, a multilingual dataset for evaluating cross-lingual summarization methods across 15 languages, and found that the ROUGE metric has limitations in this context.

We construct Global Voices, a multilingual dataset for evaluating cross-lingual summarization methods. We extract social-network descriptions of Global Voices news articles to cheaply collect evaluation data for into-English and from-English summarization in 15 languages. Especially, for the into-English summarization task, we crowd-source a high-quality evaluation dataset based on guidelines that emphasize accuracy, coverage, and understandability. To ensure the quality of this dataset, we collect human ratings to filter out bad summaries, and conduct a survey on humans, which shows that the remaining summaries are preferred over the social-network summaries. We study the effect of translation quality in cross-lingual summarization, comparing a translate-then-summarize approach with several baselines. Our results highlight the limitations of the ROUGE metric that are overlooked in monolingual summarization. Our dataset is available for download at https://forms.gle/gpkJDT6RJWHM1Ztz9 .

View on arXiv PDF

Similar