CL AI DLMar 23, 2021

Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection

Jan Philip Wahle, Terry Ruas, Norman Meuschke, Bela Gipp

arXiv:2103.12450v53.041 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of academic integrity for educators and researchers by providing a benchmark for paraphrase detection, though it is incremental as it builds on existing language models and detection methods.

The paper tackles the problem of detecting machine-generated paraphrases that threaten academic integrity by creating a benchmark of paraphrased articles using Transformer-based language models, and provides classification experiments with state-of-the-art systems along with publicly available data.

The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.

View on arXiv PDF

Similar