Software Vulnerability Prediction Knowledge Transferring Between Programming Languages
This addresses the problem of limited training data for software vulnerability detection in multiple programming languages, but it is incremental as it builds on existing transfer learning methods.
The study tackled the lack of code samples for vulnerability detection across programming languages by proposing a transfer learning technique using a CNN trained on C and adapted to Java, achieving an average recall of 72% for detecting vulnerabilities in both languages.
Developing automated and smart software vulnerability detection models has been receiving great attention from both research and development communities. One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, we address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. We use C source code samples to train a Convolutional Neural Network (CNN) model, then, we use Java source code samples to adopt and evaluate the learned model. We use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72\%. Additionally, we employ explainable AI to investigate how much each feature contributes to the knowledge transfer mechanisms between C and Java in the proposed model.