PLOct 13, 2023Code
Ranking LLM-Generated Loop Invariants for Program VerificationSaikat Chakraborty, Shuvendu K. Lahiri, Sarah Fakhoury et al.
Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results of LLMs. We have designed a ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a contrastive ranker. Experimental results demonstrate that this re-ranking mechanism significantly improves the ranking of correct invariants among the generated candidates, leading to a notable reduction in the number of calls to a verifier. The source code and the experimental data for this paper are available in \url{https://github.com/microsoft/NeuralInvariantRanker}.
PLNov 14, 2023
Finding Inductive Loop Invariants using Large Language ModelsAdharsh Kamath, Aditya Senthilnathan, Saikat Chakraborty et al.
Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting the entire program, thus are indispensable artifacts in a formal proof of correctness. Finding inductive loop invariants is an undecidable problem, and despite a long history of research towards practical solutions, it remains far from a solved problem. This paper investigates the capabilities of the Large Language Models (LLMs) in offering a new solution towards this old, yet important problem. To that end, we first curate a dataset of verification problems on programs with loops. Next, we design a prompt for exploiting LLMs, obtaining inductive loop invariants, that are checked for correctness using sound symbolic tools. Finally, we explore the effectiveness of using an efficient combination of a symbolic tool and an LLM on our dataset and compare it against a purely symbolic baseline. Our results demonstrate that LLMs can help improve the state-of-the-art in automated program verification.
18.3SEApr 9
AFGNN: API Misuse Detection using Graph Neural Networks and ClusteringPonnampalam Pirapuraj, Tamal Mondal, Sharanya Gupta et al.
Application Programming Interfaces (APIs) are crucial to software development, enabling integration of existing systems with new applications by reusing tried and tested code, saving development time and increasing software safety. In particular, the Java standard library APIs, along with numerous third-party APIs, are extensively utilized in the development of enterprise application software. However, their misuse remains a significant source of bugs and vulnerabilities. Furthermore, due to the limited examples in the official API documentation, developers often rely on online portals and generative AI models to learn unfamiliar APIs, but using such examples may introduce unintentional errors in the software. In this paper, we present AFGNN, a novel Graph Neural Network (GNN)-based framework for efficiently detecting API misuses in Java code. AFGNN uses a novel API Flow Graph (AFG) representation that captures the API execution sequence, data, and control flow information present in the code to model the API usage patterns. AFGNN uses self-supervised pre-training with AFG representation to effectively compute the embeddings for unknown API usage examples and cluster them to identify different usage patterns. Experiments on popular API usage datasets show that AFGNN significantly outperforms state-of-the-art small language models and API misuse detectors.
PLMay 16, 2020
Distributed Bounded Model CheckingPrantik Chatterjee, Subhajit Roy, Bui Phi Diep et al.
Program verification is a resource-hungry task. This paper looks at the problem of parallelizing SMT-based automated program verification, specifically bounded model-checking, so that it can be distributed and executed on a cluster of machines. We present an algorithm that dynamically unfolds the call graph of the program and frequently splits it to create sub-tasks that can be solved in parallel. The algorithm is adaptive, controlling the splitting rate according to available resources, and also leverages information from the SMT solver to split where most complexity lies in the search. We implemented our algorithm by modifying CORRAL, the verifier used by Microsoft's Static Driver Verifier (SDV), and evaluate it on a series of hard SDV benchmarks.
SEMar 7, 2014
Automatically finding atomic regions for fixing bugs in Concurrent programsSaurabh Joshi, Akash Lal
This paper presents a technique for automatically constructing a fix for buggy concurrent programs: given a concurrent program that does not satisfy user-provided assertions, we infer atomic blocks that fix the program. An atomic block protects a piece of code and ensures that it runs without interruption from other threads. Our technique uses a verification tool as a subroutine to find the smallest atomic regions that remove all bugs in a given program. Keeping the atomic regions small allows for maximum concurrency. We have implemented our approach in a tool called AtomicInf. A user of AtomicInf can choose between strong and weak atomicity semantics for the inferred fix. While the former is simpler to find, the latter provides more information about the bugs that got fixed. We ran AtomicInf on several benchmarks and came up with the smallest and the most precise atomic regions in all of them. We implemented an earlier technique to our setting and observed that AtomicInf is 1.7 times faster on an average as compared to an earlier approach.