Stefanos Gritzalis

SEJan 29, 2021

The significance of user-defined identifiers in Java source code authorship identification

Georgia Frantzeskou, Stephen G. MacDonell, Efstathios Stamatatos et al.

When writing source code, programmers have varying levels of freedom when it comes to the creation and use of identifiers. Do they habitually use the same identifiers, names that are different to those used by others? Is it then possible to tell who the author of a piece of code is by examining these identifiers? If so, can we use the presence or absence of identifiers to assist in correctly classifying programs to authors? Is it possible to hide the provenance of programs by identifier renaming? In this study, we assess the importance of three types of identifiers in source code author classification for two different Java program data sets. We do this through a sequence of experiments in which we disguise one type of identifier at a time. These experiments are performed using as a tool the Source Code Author Profiles (SCAP) method. The results show that, although identifiers when examined as a whole do not seem to reflect program authorship for these data sets, when examined separately there is evidence that class names do signal the author of the program. In contrast, simple variables and method names used in Java programs do not appear to reflect program authorship. On the contrary, our analysis suggests that such identifiers are so common as to mask authorship. We believe that these results have applicability in relation to the robustness of code plagiarism analysis and that the underlying methods could be valuable in cases of litigation arising from disputes over program authorship.

DBOct 9, 2017

SOPE: A Spatial Order Preserving Encryption Model for Multi-dimensional Data

Eirini Molla, Theodoros Tzouramanis, Stefanos Gritzalis

Due to the increasing demand for cloud services and the threat of privacy invasion, the user is suggested to encrypt the data before it is outsourced to the remote server. The safe storage and efficient retrieval of d-dimensional data on an untrusted server has therefore crucial importance. The paper proposes a new encryption model which offers spatial order-preservation for d-dimensional data (SOPE model). The paper studies the operations for the construction of the encrypted database and suggests algorithms that exploit unique properties that this new model offers for the efficient execution of a whole range of well-known queries over the encrypted d-dimensional data. The new model utilizes well-known database indices, such as the B+-tree and the R-tree, as backbone structures in their traditional form, as it suggests no modifications to them for loading the data and for the efficient execution of the supporting query algorithms. An extensive experimental study that is also presented in the paper indicates the effectiveness and practicability of the proposed encryption model for real-life d-dimensional data applications.

Stefanos Gritzalis

2 Papers