Cristian Cadar

SE
3papers
119citations
Novelty45%
AI Score43

3 Papers

93.3SEApr 17Code
Evaluating LLM Agents on Automated Software Analysis Tasks

Islem Bouzenia, Cristian Cadar, Michael Pradel

Numerous software analysis tools exist today, yet applying them to diverse open-source projects remains challenging due to environment setup, dependency resolution, and tool configuration. LLM-based agents offer a potential solution, yet no prior work has systematically studied their effectiveness on the specific task of automated software analysis, which, unlike issue solving or general environment setup, requires installing and configuring a separate analysis tool alongside the target project, generating tool-specific prerequisites, and validating that the tool produces meaningful analysis outputs. We introduce AnalysisBench, a benchmark of 35 tool-project pairs spanning seven analysis tools and ten diverse C/C++ and Java projects, each with a manually constructed reference setup. Using AnalysisBench, we evaluate four agent architectures across four LLM backends. Our custom agent, AnalysisAgent, achieves manually verified success rates of 94% (Gemini-3-Flash, 33/35 tasks), compared to 77% for the best baseline (ExecutionAgent). Beyond quantitative results, we identify key limitations in existing agents, including stage mixing, poor error localization, and premature termination, and show that agentic architecture matters more than LLM capability alone. We further find that whole-program analyses and symbolic execution are the most difficult tasks, that Java toolchains pose greater challenges than C/C++, and that LLM-self-validated success consistently overstates manually verified success.

CRJan 11, 2022
SnapFuzz: An Efficient Fuzzing Framework for Network Applications

Anastasios Andronidis, Cristian Cadar

In recent years, fuzz testing has benefited from increased computational power and important algorithmic advances, leading to systems that have discovered many critical bugs and vulnerabilities in production software. Despite these successes, not all applications can be fuzzed efficiently. In particular, stateful applications such as network protocol implementations are constrained by their low fuzzing throughput and the need to develop fuzzing harnesses that reset their state and isolate their side effects. In this paper, we present SnapFuzz, a novel fuzzing framework for network applications. SnapFuzz offers a robust architecture that transforms slow asynchronous network communication into fast synchronous communication, snapshots the target at the latest point at which it is safe to do so, speeds up all file operations by redirecting them to a custom in-memory filesystem, and removes the need for many fragile modifications, such as configuring time delays or writing clean-up scripts, together with several other improvements. Using SnapFuzz, we fuzzed five popular networking applications: LightFTP, TinyDTLS, Dnsmasq, LIVE555 and Dcmqrscp. We report impressive performance speedups of 62.8x, 41.2x, 30.6x, 24.6x, and 8.4x, respectively, with significantly simpler fuzzing harnesses in all cases. Through its performance advantage, SnapFuzz has also found 12 extra crashes compared to AFLNet in these applications.

SEFeb 25, 2019
A Systematic Impact Study for Fuzzer-Found Compiler Bugs

Michaël Marcozzi, Qiyi Tang, Alastair F. Donaldson et al.

Despite much recent interest in compiler randomized testing (fuzzing), the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) how much such changes do cause regression test suite failures and could be used to manually trigger divergences during execution. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or by formal verification. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for some packages, but rarely affect their syntax and cause two failures in total when running their test suites. User-reported and formal verification bugs do not exhibit a higher impact, with less frequently triggered bugs and one test failure. Our manual analysis of a selection of bugs, either fuzzer-found or not, suggests that none can easily trigger a runtime divergence on the packages considered in the analysis, and that in general they affect only corner cases.