CLFeb 10, 2017

UsingWord Embedding for Cross-Language Plagiarism Detection

J. Ferrero, F. Agnes, L. Besacier, D. Schwab

arXiv:1702.03082v12.162 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses plagiarism detection across languages, but it is incremental as it applies word embeddings to an existing problem.

The paper tackled cross-language plagiarism detection by proposing new methods based on word embeddings, achieving an overall F1 score of 89.15% for English-French similarity detection at chunk level.

This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very challenging corpus.

View on arXiv PDF Code

Similar