Lucja Kot

SEJan 9, 2019Code

Automated Customized Bug-Benchmark Generation

Vineeth Kashyap, Jason Ruchti, Lucja Kot et al.

We introduce Bug-Injector, a system that automatically creates benchmarks for customized evaluation of static analysis tools. We share a benchmark generated using Bug-Injector and illustrate its efficacy by using it to evaluate the recall of two leading open-source static analysis tools: Clang Static Analyzer and Infer. Bug-Injector works by inserting bugs based on bug templates into real-world host programs. It runs tests on the host program to collect dynamic traces, searches the traces for a point where the state satisfies the preconditions for some bug template, then modifies the host program to inject a bug based on that template. Injected bugs are used as test cases in a static analysis tool evaluation benchmark. Every test case is accompanied by a program input that exercises the injected bug. We have identified a broad range of requirements and desiderata for bug benchmarks; our approach generates on-demand test benchmarks that meet these requirements. It also allows us to create customized benchmarks suitable for evaluating tools for a specific use case (e.g., a given codebase and set of bug types). Our experimental evaluation demonstrates the suitability of our generated benchmark for evaluating static bug-detection tools and for comparing the performance of different tools.

PLSep 18, 2020

Out of Sight, Out of Place: Detecting and Assessing Swapped Arguments

Roger Scott, Joseph Ranieri, Lucja Kot et al.

Programmers often add meaningful information about program semantics when naming program entities such as variables, functions, and macros. However, static analysis tools typically discount this information when they look for bugs in a program. In this work, we describe the design and implementation of a static analysis checker called SwapD, which uses the natural language information in programs to warn about mistakenly-swapped arguments at call sites. SwapD combines two independent detection strategies to improve the effectiveness of the overall checker. We present the results of a comprehensive evaluation of SwapD over a large corpus of C and C++ programs totaling 417 million lines of code. In this evaluation, SwapD found 154 manually-vetted real-world cases of mistakenly-swapped arguments, suggesting that such errors, while not pervasive in released code, are a real problem and a worthwhile target for static analysis.

Lucja Kot

2 Papers