CLAIMay 25, 2023

Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

arXiv:2305.15678v1233 citations
Originality Incremental advance
AI Analysis

This addresses the problem of limited multilingual evaluation for text simplification researchers, though it is incremental as it builds on existing resources and methods.

The paper tackles the lack of diverse multilingual benchmarks for text simplification by introducing MultiSim, a collection of 27 resources in 12 languages with over 1.7 million complex-simple sentence pairs, and shows that multilingual training improves performance in non-English settings, with Russian achieving strong zero-shot transfer and BLOOM-176b few-shot prompting outperforming fine-tuned models in most languages.

Recent advancements in high-quality, large-scale English resources have pushed the frontier of English Automatic Text Simplification (ATS) research. However, less work has been done on multilingual text simplification due to the lack of a diverse evaluation benchmark that covers complex-simple sentence pairs in many languages. This paper introduces the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs. This benchmark will encourage research in developing more effective multilingual text simplification models and evaluation metrics. Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings. We observe strong performance from Russian in zero-shot cross-lingual transfer to low-resource languages. We further show that few-shot prompting with BLOOM-176b achieves comparable quality to reference simplifications outperforming fine-tuned models in most languages. We validate these findings through human evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes