CLAIIRApr 1, 2024

BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System

arXiv:2404.01582v27 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the problem of plagiarism detection for educational or content verification purposes, but it is incremental as it builds on existing methods like BERT and Faiss.

The paper tackles the challenge of detecting high-level plagiarism by generating a dataset of 32,927 text pairs using GPT-3.5 and proposes a BERT-enhanced retrieval method that achieves high performance, with metrics such as 98.86% accuracy and 0.9888 F1 score.

Text plagiarism detection task is a common natural language processing task that aims to detect whether a given text contains plagiarism or copying from other texts. In existing research, detection of high level plagiarism is still a challenge due to the lack of high quality datasets. In this paper, we propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs of text plagiarism detection datasets covering a wide range of plagiarism methods, bridging the gap in this part of research. Meanwhile, we propose a plagiarism identification method based on Faiss with BERT with high efficiency and high accuracy. Our experiments show that the performance of this model outperforms other models in several metrics, including 98.86\%, 98.90%, 98.86%, and 0.9888 for Accuracy, Precision, Recall, and F1 Score, respectively. At the end, we also provide a user-friendly demo platform that allows users to upload a text library and intuitively participate in the plagiarism analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes