CLJun 2, 2021

Evaluating the Efficacy of Summarization Evaluation across Languages

Fajri Koto, Jey Han Lau, Timothy Baldwin

arXiv:2106.01478v131.6713 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of cross-language summarization evaluation for NLP researchers, providing a first systematic quantification.

The study systematically evaluated the effectiveness of 19 automatic summarization evaluation metrics across eight languages, finding that multilingual BERT within BERTScore performed well above English levels.

While automatic summarization evaluation methods developed for English are routinely applied to other languages, this is the first attempt to systematically quantify their panlinguistic efficacy. We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall). Based on this, we evaluate 19 summarization evaluation metrics, and find that using multilingual BERT within BERTScore performs well across all languages, at a level above that for English.

View on arXiv PDF Code

Similar