48.7SEMay 5
Large Language Model assisted Hybrid FuzzingRuijie Meng, Gregory J. Duck, Abhik Roychoudhury
Greybox fuzzing is one of the most popular methods for detecting software vulnerabilities, which conducts a biased random search within the program input space. To enhance its effectiveness in achieving deep coverage of program behaviors, greybox fuzzing is often combined with concolic execution, which performs a path-sensitive search over the domain of program inputs. In hybrid fuzzing, conventional greybox fuzzing is followed by concolic execution in an iterative loop, where reachability roadblocks encountered by greybox fuzzing are tackled by concolic execution. However, such hybrid fuzzing still suffers from difficulties conventionally faced by concolic execution, such as the need for environment modeling and system call support. In this work, we explore the potential of developing "smart" concolic execution empowered by Large Language Models (LLMs), leveraging their knowledge of code semantics during constraint computing and solving. When coverage-based greybox fuzzing reaches a roadblock in terms of reaching certain branches, we conduct a slicing on the execution trace and suggest modifications of the input to reach the relevant branches. The LLM is used as a solver to generate the modified input to reach the desired branches. Compared with state-of-the-art hybrid fuzzers CoFuzz, Intriguer, and QSYM, our LLM-based hybrid fuzzer HyllFuzz(pronounced "hill fuzz") covers 31.43%, 44.56%, and 59.48% more code branches, respectively. Furthermore, the LLM-based concolic execution in HyllFuzz takes a time that is 3--19 times faster than the concolic execution running in existing hybrid fuzzing tools. In extensively tested real-world subjects, HyllFuzz exposed seven previously unknown bugs. This experience shows that LLMs can be effectively inserted into the iterative loop of hybrid fuzzers to efficiently expose more program behaviors.
21.4PLApr 15
Persistent Iterators with Value SemanticsYihe Li, Gregory J. Duck
Iterators are a fundamental programming abstraction for traversing and modifying elements in containers in mainstream imperative languages such as C++. Iterators provide a uniform access mechanism that hides low-level implementation details of the underlying data structure. However, iterators over mutable containers suffer from well-known hazards including invalidation, aliasing, data races, and subtle side effects. Immutable data structures, as used in functional programming languages, avoid the pitfalls of mutation but rely on a very different programming model based on recursion and higher-order combinators rather than iteration. However, these combinators are not always well-suited to expressing certain algorithms, and recursion can expose implementation details of the underlying data structure. In this paper, we propose persistent iterators -- a new abstraction that reconciles the familiar iterator-based programming style of imperative languages with the semantics of persistent data structures. A persistent iterator snapshots the version of its underlying container at creation, ensuring safety against invalidation and aliasing. Iterator operations operate on the iterator-local copy of the container, giving true value semantics: variables can be rebound to new persistent values while previous versions remain accessible. We implement our approach in the form of LibFPP -- a C++ container library providing persistent vectors, maps, sets, strings, and other abstractions as persistent counterparts to the Standard Template Library (STL). Our evaluation shows that LibFPP retains the expressiveness of iterator-based programming, eliminates iterator-invalidation, and achieves asymptotic complexities comparable to STL implementations. Our design targets use cases where persistence and safety are desired, while allowing developers to retain familiar iterator-based programming patterns.