CLAug 25, 2025

German4All -- A Dataset and Model for Readability-Controlled Paraphrasing in German

arXiv:2508.17973v22 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for accessible texts tailored to diverse reader groups in German, though it is incremental as it builds on existing paraphrasing and simplification methods.

The authors tackled the problem of generating German paraphrases at different readability levels by introducing German4All, a dataset of over 25,000 aligned paragraph-level paraphrases across five levels, synthesized with GPT-4 and evaluated through human and LLM judgments. They trained an open-source model that achieves state-of-the-art performance in German text simplification.

The ability to paraphrase texts across different complexity levels is essential for creating accessible texts that can be tailored toward diverse reader groups. Thus, we introduce German4All, the first large-scale German dataset of aligned readability-controlled, paragraph-level paraphrases. It spans five readability levels and comprises over 25,000 samples. The dataset is automatically synthesized using GPT-4 and rigorously evaluated through both human and LLM-based judgments. Using German4All, we train an open-source, readability-controlled paraphrasing model that achieves state-of-the-art performance in German text simplification, enabling more nuanced and reader-specific adaptations. We opensource both the dataset and the model to encourage further research on multi-level paraphrasing

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes