Learning Invariants using Decision Trees
This work addresses the challenge of automated program verification for software engineers, offering an incremental improvement in invariant inference methods.
The paper tackles the problem of inferring inductive invariants for verifying program safety by formulating it as a binary classification task and proposes a new algorithm using decision trees to learn invariants as Boolean combinations of numerical inequalities. The algorithm successfully verifies C programs from benchmarks, scales well to large sample sets, and compares favorably to other ML-based techniques.
The problem of inferring an inductive invariant for verifying program safety can be formulated in terms of binary classification. This is a standard problem in machine learning: given a sample of good and bad points, one is asked to find a classifier that generalizes from the sample and separates the two sets. Here, the good points are the reachable states of the program, and the bad points are those that reach a safety property violation. Thus, a learned classifier is a candidate invariant. In this paper, we propose a new algorithm that uses decision trees to learn candidate invariants in the form of arbitrary Boolean combinations of numerical inequalities. We have used our algorithm to verify C programs taken from the literature. The algorithm is able to infer safe invariants for a range of challenging benchmarks and compares favorably to other ML-based invariant inference techniques. In particular, it scales well to large sample sets.