SEIROct 28, 2018

Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

arXiv:1810.11903v13 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in source code plagiarism detection for developers and educators, but it is incremental as it builds on existing IR-based filtering methods.

The paper tackles the problem of time inefficiency in source code plagiarism detection by proposing two dynamic thresholding mechanisms (range-based and pair-count-based) that automatically tune similarity thresholds based on similarity degree distributions, resulting in more practical efficiency improvements and effectiveness reductions compared to manual assignment.

To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms---namely range-based and pair-count-based mechanism---that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes