PL SEJan 31

Defusing Logic Bombs in Symbolic Execution with LLM-Generated Ghost Code

Dimitrios Stamatios Bouras, Sergey Mechtaev

arXiv:2603.19239h-index: 2

Originality Incremental advance

AI Analysis

This addresses the problem of improving symbolic execution for real-world codebases, offering a practical solution for program analysis researchers and practitioners, though it is incremental as it builds on existing hybrid approaches.

The paper tackles the limitations of symbolic execution in handling solver-hostile code by introducing Gordian, a hybrid framework that uses LLMs to generate ghost code to aid SMT solvers, resulting in average coverage improvements of 52-84% over traditional methods and 86-419% over LLM-based techniques while reducing LLM token usage by 90-96%.

Symbolic execution is a powerful program analysis technique, but its effectiveness is fundamentally limited by solver-hostile program fragments, complex numerical reasoning, and unbounded heap structures. Recent work proposed replacing constraint solvers with large language models (LLMs) to bypass these limitations, but such approaches struggle to analyze real-world codebases, where deep execution paths require globally consistent reasoning across many interacting constraints. We present Gordian, a hybrid symbolic execution framework that uses LLMs selectively to generate lightweight ghost code that aids an SMT solver in handling solver-hostile code fragments, while preserving its precise, global reasoning capability. In particular, we propose three types of ghost code: (1) inversion of difficult code fragments with iterative bidirectional constraint propagation, (2) modeling via solver-friendly surrogates while preserving relevant behavior, and (3) semantic partitioning of unbounded heap spaces. We implemented Gordian on top of the KLEE symbolic execution engine and evaluated it on synthetic "logic bombs" capturing distinct symbolic reasoning challenges, a popular mathematical library FDLibM, and three structured-input programs (libexpat, jq, and bc). Across all benchmarks, Gordian improves coverage on average by 52-84% over traditional symbolic execution baselines, and by 86-419% over LLM-based techniques, while reducing LLM token usage by an average of 90-96%. This highlights the practicality and effectiveness of this approach in real-world settings.

View on arXiv PDF

Similar