IRCLMar 11, 2017

A German Corpus for Text Similarity Detection Tasks

arXiv:1703.03923v19 citations
Originality Synthesis-oriented
AI Analysis

This work provides a domain-specific resource for researchers and practitioners in natural language processing focusing on German text similarity, but it is incremental as it applies existing methods to new data.

The authors tackled the lack of a German corpus for text similarity detection by presenting a new textual German corpus designed to automatically assess similarity between texts and evaluate various similarity measures, both for whole documents and individual sentences, with results including the calculation of several simple measures based on a library of similarity functions.

Text similarity detection aims at measuring the degree of similarity between a pair of texts. Corpora available for text similarity detection are designed to evaluate the algorithms to assess the paraphrase level among documents. In this paper we present a textual German corpus for similarity detection. The purpose of this corpus is to automatically assess the similarity between a pair of texts and to evaluate different similarity measures, both for whole documents or for individual sentences. Therefore we have calculated several simple measures on our corpus based on a library of similarity functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes