Yawei Zhu

CRJan 8, 2020

VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector

Zhen Li, Deqing Zou, Shouhuai Xu et al.

Automatically detecting software vulnerabilities is an important problem that has attracted much attention from the academic research community. However, existing vulnerability detectors still cannot achieve the vulnerability detection capability and the locating precision that would warrant their adoption for real-world use. In this paper, we present a vulnerability detector that can simultaneously achieve a high detection capability and a high locating precision, dubbed Vulnerability Deep learning-based Locator (VulDeeLocator). In the course of designing VulDeeLocator, we encounter difficulties including how to accommodate semantic relations between the definitions of types as well as macros and their uses across files, how to accommodate accurate control flows and variable define-use relations, and how to achieve high locating precision. We solve these difficulties by using two innovative ideas: (i) leveraging intermediate code to accommodate extra semantic information, and (ii) using the notion of granularity refinement to pin down locations of vulnerabilities. When applied to 200 files randomly selected from three real-world software products, VulDeeLocator detects 18 confirmed vulnerabilities (i.e., true-positives). Among them, 16 vulnerabilities correspond to known vulnerabilities; the other two are not reported in the National Vulnerability Database (NVD) but have been "silently" patched by the vendor of Libav when releasing newer versions.

LGJul 18, 2018

SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities

Zhen Li, Deqing Zou, Shouhuai Xu et al.

The detection of software vulnerabilities (or vulnerabilities for short) is an important problem that has yet to be tackled, as manifested by the many vulnerabilities reported on a daily basis. This calls for machine learning methods for vulnerability detection. Deep learning is attractive for this purpose because it alleviates the requirement to manually define features. Despite the tremendous success of deep learning in other application domains, its applicability to vulnerability detection is not systematically understood. In order to fill this void, we propose the first systematic framework for using deep learning to detect vulnerabilities in C/C++ programs with source code. The framework, dubbed Syntax-based, Semantics-based, and Vector Representations (SySeVR), focuses on obtaining program representations that can accommodate syntax and semantic information pertinent to vulnerabilities. Our experiments with 4 software products demonstrate the usefulness of the framework: we detect 15 vulnerabilities that are not reported in the National Vulnerability Database. Among these 15 vulnerabilities, 7 are unknown and have been reported to the vendors, and the other 8 have been "silently" patched by the vendors when releasing newer versions of the pertinent software products.

Yawei Zhu

2 Papers