CLApr 16, 2019

Subjective Assessment of Text Complexity: A Dataset for German Language

arXiv:1904.07733v142 citations
Originality Synthesis-oriented
AI Analysis

This provides a resource for German language processing, but it is incremental as it focuses on dataset creation without new methods.

The authors introduced TextComplexityDE, a dataset of 1000 German sentences from Wikipedia with subjective complexity ratings from learners and manual simplifications, to support text-complexity prediction and simplification models.

This paper presents TextComplexityDE, a dataset consisting of 1000 sentences in German language taken from 23 Wikipedia articles in 3 different article-genres to be used for developing text-complexity predictor models and automatic text simplification in German language. The dataset includes subjective assessment of different text-complexity aspects provided by German learners in level A and B. In addition, it contains manual simplification of 250 of those sentences provided by native speakers and subjective assessment of the simplified sentences by participants from the target group. The subjective ratings were collected using both laboratory studies and crowdsourcing approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes