LG CV IR PLJan 30, 2021

ICodeNet -- A Hierarchical Neural Network Approach for Source Code Author Identification

Pranali Bora, Tulika Awalgaonkar, Himanshu Palve, Raviraj Joshi, Purvi Goel

arXiv:2102.00230v11.6Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of code plagiarism and attribution for open-source communities, but it is incremental as it builds on existing neural network architectures.

The paper tackles source code author identification by proposing ICodeNet, a hierarchical neural network that processes source code as images, achieving competitive results compared to simpler image-based and text-based models on a classification dataset.

With the open-source revolution, source codes are now more easily accessible than ever. This has, however, made it easier for malicious users and institutions to copy the code without giving regards to the license, or credit to the original author. Therefore, source code author identification is a critical task with paramount importance. In this paper, we propose ICodeNet - a hierarchical neural network that can be used for source code file-level tasks. The ICodeNet processes source code in image format and is employed for the task of per file author identification. The ICodeNet consists of an ImageNet trained VGG encoder followed by a shallow neural network. The shallow network is based either on CNN or LSTM. Different variations of models are evaluated on a source code author classification dataset. We have also compared our image-based hierarchical neural network model with simple image-based CNN architecture and text-based CNN and LSTM models to highlight its novelty and efficiency.

View on arXiv PDF Code

Similar