CRLGMay 23, 2023

Sequential Graph Neural Networks for Source Code Vulnerability Identification

arXiv:2306.05375v1
Originality Incremental advance
AI Analysis

This addresses the challenge of automating vulnerability detection in C/C++ code for cybersecurity, though it appears incremental as it builds on existing graph neural network approaches.

The authors tackled the problem of source code vulnerability identification by creating a curated dataset from the CVE database and proposing a sequential graph neural network framework, achieving state-of-the-art results in evaluations against four baseline methods.

Vulnerability identification constitutes a task of high importance for cyber security. It is quite helpful for locating and fixing vulnerable functions in large applications. However, this task is rather challenging owing to the absence of reliable and adequately managed datasets and learning models. Existing solutions typically rely on human expertise to annotate datasets or specify features, which is prone to error. In addition, the learning models have a high rate of false positives. To bridge this gap, in this paper, we present a properly curated C/C++ source code vulnerability dataset, denoted as CVEFunctionGraphEmbeddings (CVEFGE), to aid in developing models. CVEFGE is automatically crawled from the CVE database, which contains authentic and publicly disclosed source code vulnerabilities. We also propose a learning framework based on graph neural networks, denoted SEquential Graph Neural Network (SEGNN) for learning a large number of code semantic representations. SEGNN consists of a sequential learning module, graph convolution, pooling, and fully connected layers. Our evaluations on two datasets and four baseline methods in a graph classification setting demonstrate state-of-the-art results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes