AI SESep 7, 2021

Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Yufan Zhuang, Sahil Suneja, Veronika Thost, Giacomo Domeniconi, Alessandro Morari, Jim Laredo

arXiv:2109.03341v111.121 citations

Originality Incremental advance

AI Analysis

This work addresses software security breaches by automating vulnerability detection, offering a domain-specific solution that is incremental in its enhancements to existing graph neural network methods.

The paper tackles the problem of identifying vulnerable code in software by developing a novel graph neural network that learns insecure patterns from code corpora, achieving improved prediction performance over multiple baseline approaches across two real-world datasets.

Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.

View on arXiv PDF

Similar