CLSep 19, 2019

A Corpus for Automatic Readability Assessment and Text Simplification of German

arXiv:1909.09067v1999 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better resources in natural language processing for German, specifically for readability and simplification tasks, though it is incremental as it extends an existing corpus standard.

The authors tackled the lack of a comprehensive corpus for automatic readability assessment and text simplification in German by compiling a corpus of approximately 211,000 sentences from web sources, which includes novel information on text structure, typography, and images to enhance machine learning approaches.

In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification of German. The corpus is compiled from web sources and consists of approximately 211,000 sentences. As a novel contribution, it contains information on text structure, typography, and images, which can be exploited as part of machine learning approaches to readability assessment and text simplification. The focus of this publication is on representing such information as an extension to an existing corpus standard.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes