Attention Distance: A Novel Metric for Directed Fuzzing with Large Language Models
This addresses the problem of inefficient target localization in complex binaries for software security researchers and practitioners, representing a novel method rather than an incremental improvement.
The paper tackles the problem of Directed Grey-Box Fuzzing (DGF) in software security testing by introducing attention distance, a metric that uses large language models to analyze logical relationships between code elements instead of just physical distances. The result is a 3.43× average increase in testing efficiency over traditional methods and 2.89× to 7.13× improvements over state-of-the-art fuzzers in vulnerability reproduction experiments.
In the domain of software security testing, Directed Grey-Box Fuzzing (DGF) has garnered widespread attention for its efficient target localization and excellent detection performance. However, existing approaches measure only the physical distance between seed execution paths and target locations, overlooking logical relationships among code segments. This omission can yield redundant or misleading guidance in complex binaries, weakening DGF's real-world effectiveness. To address this, we introduce \textbf{attention distance}, a novel metric that leverages a large language model's contextual analysis to compute attention scores between code elements and reveal their intrinsic connections. Under the same AFLGo configuration -- without altering any fuzzing components other than the distance metric -- replacing physical distances with attention distances across 38 real vulnerability reproduction experiments delivers a \textbf{3.43$\times$} average increase in testing efficiency over the traditional method. Compared to state-of-the-art directed fuzzers DAFL and WindRanger, our approach achieves \textbf{2.89$\times$} and \textbf{7.13$\times$} improvements, respectively. To further validate the generalizability of attention distance, we integrate it into DAFL and WindRanger, where it also consistently enhances their original performance. All related code and datasets are publicly available at https://github.com/TheBinKing/Attention\_Distance.git.