SECRLGPLJun 15, 2020

Learning to map source code to software vulnerability using code-as-a-graph

arXiv:2006.08614v137 citations
Originality Incremental advance
AI Analysis

This addresses the problem of improving software security for developers and organizations by providing a more effective vulnerability detection method, though it is incremental as it builds on existing graph-based approaches.

The paper tackled the problem of detecting software vulnerabilities in source code by using Graph Neural Networks to learn from code represented as graphs, achieving better performance than static analyzers and other deep learning models on two out of three datasets.

We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of relationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes