IRJan 25, 2018

Analyzing Similarity in Mathematical Content To Enhance the Detection of Academic Plagiarism

arXiv:1801.08439v13 citations

Originality Synthesis-oriented

AI Analysis

This work tackles the problem of academic plagiarism detection for disciplines heavily reliant on mathematics, but it is incremental as it reviews and analyzes existing methods rather than introducing new ones.

The paper addresses the gap in plagiarism detection tools that ignore mathematical content by analyzing existing mathematical information retrieval approaches, finding that syntax-based methods detect undisguised plagiarism well, while structure-based and hybrid methods show promise for disguised cases, though limitations remain at the formula-level and with equivalence transformations.

Despite the effort put into the detection of academic plagiarism, it continues to be a ubiquitous problem spanning all disciplines. Various tools have been developed to assist human inspectors by automatically identifying suspicious documents. However, to our knowledge currently none of these tools use mathematical content for their analysis. This is problematic, because mathematical content potentially represents a significant amount of the scientific contribution in academic documents. Hence, ignoring mathematical content limits the detection of plagiarism considerably, especially in disciplines with frequent use of mathematics. This paper aims to help close this gap by providing an overview of existing approaches in mathematical information retrieval and an analysis of their applicability for different possible cases of mathematical plagiarism. I find that whereas syntax-based approaches perform particularly well in detecting undisguised plagiarism, structure-based and hybrid approaches promise to also detect forms of disguised mathematical plagiarism, such as plagiarism with renamed identifiers. However, more research in this area is needed to enable the detection of more complex mathematical plagiarism: the scope of current approaches is restricted to the formula-level, an extension to the section-level is needed. Additionally, the general detection of equivalence transformations is currently not feasible. Despite these remaining problems, I conclude that the presented approaches could already be used for a basic automated detection system targeting mathematical plagiarism and therefore enhance current plagiarism detection systems.

View on arXiv PDF

Similar