SEOct 28, 2018
Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism DetectionOscar Karnalim, Lisan Sulistiani
To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms---namely range-based and pair-count-based mechanism---that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.
SESep 23, 2018
Which Source Code Plagiarism Detection Approach is More Humane?Oscar Karnalim, Lisan Sulistiani
This paper contributes in developing source code plagiarism detection that is more aligned with human perspective. Three evaluation mechanisms that directly relate human perspective with evaluated approaches are proposed: think-aloud, aspect-oriented, and empirical mechanism. Using those mechanisms, a comparative study toward attribute-and structure-based plagiarism detection approach (i.e., two popular approach categories in source code plagiarism detection) is conducted. According to that study, structure-based approach is more effective than the attribute-based one; its signature aspect and resulted similarity degrees are more related to human preferences. In addition, such approach is related to most human-oriented aspects for suspecting source code plagiarism.