SEAICRLGJun 18, 2024

Variables are a Curse in Software Vulnerability Prediction

arXiv:2407.02509v1
Originality Incremental advance
AI Analysis

This work addresses software vulnerability prediction for developers and security analysts, offering a novel method to handle variable naming variability, though it is incremental in improving graph-based deep learning techniques.

The paper tackles the problem of software vulnerability prediction by addressing the curse of variable naming, which hinders deep learning models from learning intrinsic program functionality. The proposed techniques, including name dependence edges and a 3-property encoding scheme, improve prediction accuracy and reduce memory usage by up to 30,000 times compared to existing approaches.

Deep learning-based approaches for software vulnerability prediction currently mainly rely on the original text of software code as the feature of nodes in the graph of code and thus could learn a representation that is only specific to the code text, rather than the representation that depicts the 'intrinsic' functionality of a program hidden in the text representation. One curse that causes this problem is an infinite number of possibilities to name a variable. In order to lift the curse, in this work we introduce a new type of edge called name dependence, a type of abstract syntax graph based on the name dependence, and an efficient node representation method named 3-property encoding scheme. These techniques will allow us to remove the concrete variable names from code, and facilitate deep learning models to learn the functionality of software hidden in diverse code expressions. The experimental results show that the deep learning models built on these techniques outperform the ones based on existing approaches not only in the prediction of vulnerabilities but also in the memory need. The factor of memory usage reductions of our techniques can be up to the order of 30,000 in comparison to existing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes