Deep Learning-based Binary Analysis for Vulnerability Detection in x86-64 Machine Code
This addresses the challenge of efficient and lightweight vulnerability detection for binary analysis in cybersecurity, though it appears incremental as it builds on existing deep learning approaches by focusing on raw machine code.
This paper tackled the problem of vulnerability detection in x86-64 machine code by exploring deep learning models that extract features directly from raw machine code instead of disassembled binaries, finding that graph-based models consistently outperform sequential models and that machine code contains sufficient information for effective detection.
While much of the current research in deep learning-based vulnerability detection relies on disassembled binaries, this paper explores the feasibility of extracting features directly from raw x86-64 machine code. Although assembly language is more interpretable for humans, it requires more complex models to capture token-level context. In contrast, machine code may enable more efficient, lightweight models and preserve all information that might be lost in disassembly. This paper approaches the task of vulnerability detection through an exploratory study on two specific deep learning model architectures and aims to systematically evaluate their performance across three vulnerability types. The results demonstrate that graph-based models consistently outperform sequential models, emphasizing the importance of control flow relationships, and that machine code contains sufficient information for effective vulnerability discovery.