CLMay 29, 2018

Semantically-informed distance and similarity measures for paraphrase plagiarism identification

Miguel A. Álvarez-Carmona, Marc Franco-Salvador, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Paolo Rosso, Luis Villaseñor-Pineda

arXiv:1805.11611v10.2

Originality Incremental advance

AI Analysis

This addresses the challenge of detecting intentionally modified plagiarized texts for applications in academic integrity and content verification, representing an incremental improvement.

The paper tackled the problem of identifying paraphrase plagiarism by introducing two new semantically-informed measures for text relatedness, which achieved competitive results against state-of-the-art methods with a simpler approach.

Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance. Both measures are able to extract semantic information from either an external resource or a distributed representation of words, resulting in informative features for training a supervised classifier for detecting paraphrase plagiarism. Obtained results indicate that the proposed metrics are consistently good in detecting different types of paraphrase plagiarism. In addition, results are very competitive against state-of-the art methods having the advantage of representing a much more simple but equally effective solution.

View on arXiv PDF

Similar