CR LG PL SEOct 4, 2022

NeuDep: Neural Binary Memory Dependence Analysis

Kexin Pei, Dongdong She, Michael Wang, Scott Geng, Zhou Xuan, Yaniv David, Junfeng Yang, Suman Jana, Baishakhi Ray

arXiv:2210.02853v17.08 citationsh-index: 47

Originality Highly original

AI Analysis

This addresses a critical task in binary analysis for reverse engineering and security applications, offering a novel method that improves precision and speed over existing approaches.

The paper tackles the problem of statically determining memory dependencies in binary code, which is challenging due to compiler optimizations and lack of symbols, by presenting NeuDep, a machine-learning-based approach that uses self-supervised pretraining and supervised finetuning. The result shows that NeuDep is 1.5x more precise and 3.5x faster than the current state-of-the-art.

Determining whether multiple instructions can access the same memory location is a critical task in binary analysis. It is challenging as statically computing precise alias information is undecidable in theory. The problem aggravates at the binary level due to the presence of compiler optimizations and the absence of symbols and types. Existing approaches either produce significant spurious dependencies due to conservative analysis or scale poorly to complex binaries. We present a new machine-learning-based approach to predict memory dependencies by exploiting the model's learned knowledge about how binary programs execute. Our approach features (i) a self-supervised procedure that pretrains a neural net to reason over binary code and its dynamic value flows through memory addresses, followed by (ii) supervised finetuning to infer the memory dependencies statically. To facilitate efficient learning, we develop dedicated neural architectures to encode the heterogeneous inputs (i.e., code, data values, and memory addresses from traces) with specific modules and fuse them with a composition learning strategy. We implement our approach in NeuDep and evaluate it on 41 popular software projects compiled by 2 compilers, 4 optimizations, and 4 obfuscation passes. We demonstrate that NeuDep is more precise (1.5x) and faster (3.5x) than the current state-of-the-art. Extensive probing studies on security-critical reverse engineering tasks suggest that NeuDep understands memory access patterns, learns function signatures, and is able to match indirect calls. All these tasks either assist or benefit from inferring memory dependencies. Notably, NeuDep also outperforms the current state-of-the-art on these tasks.

View on arXiv PDF

Similar