PL AI CRNov 3, 2017

SPARK: Static Program Analysis Reasoning and Retrieving Knowledge

Wasuwee Sodsong, Bernhard Scholz, Sanjay Chawla

arXiv:1711.01024v12.31 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of automating security analysis for software developers, though it appears incremental as it builds on existing program analysis and machine learning techniques.

The authors tackled the problem of automatically generating security analyzers for programs by using a machine learning pipeline to deduce symbolic rules from examples, achieving feasibility for large-scale codebases like OpenJDK with millions of lines of code.

Program analysis is a technique to reason about programs without executing them, and it has various applications in compilers, integrated development environments, and security. In this work, we present a machine learning pipeline that induces a security analyzer for programs by example. The security analyzer determines whether a program is either secure or insecure based on symbolic rules that were deduced by our machine learning pipeline. The machine pipeline is two-staged consisting of a Recurrent Neural Networks (RNN) and an Extractor that converts an RNN to symbolic rules. To evaluate the quality of the learned symbolic rules, we propose a sampling-based similarity measurement between two infinite regular languages. We conduct a case study using real-world data. In this work, we discuss the limitations of existing techniques and possible improvements in the future. The results show that with sufficient training data and a fair distribution of program paths it is feasible to deducing symbolic security rules for the OpenJDK library with millions lines of code.

View on arXiv PDF

Similar