CL IRApr 3, 2023

A Simple and Effective Method of Cross-Lingual Plagiarism Detection

Karen Avetisyan, Arthur Malajyan, Tsolak Ghukasyan, Arutyun Avetisyan

arXiv:2304.01352v28 citationsh-index: 16

AI Analysis

This addresses plagiarism detection across multiple languages, including under-resourced ones, but is incremental as it builds on existing multilingual models.

The paper tackles cross-lingual plagiarism detection by proposing a method that uses multilingual thesauri and BERT models without machine translation, achieving state-of-the-art results for French, Russian, and Armenian languages.

We present a simple cross-lingual plagiarism detection method applicable to a large number of languages. The presented approach leverages open multilingual thesauri for candidate retrieval task and pre-trained multilingual BERT-based language models for detailed analysis. The method does not rely on machine translation and word sense disambiguation when in use, and therefore is suitable for a large number of languages, including under-resourced languages. The effectiveness of the proposed approach is demonstrated for several existing and new benchmarks, achieving state-of-the-art results for French, Russian, and Armenian languages.

View on arXiv PDF

Similar