CRAIIRLGJun 2, 2024

Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets

arXiv:2406.02606v2
AI Analysis

This provides a general solution for malware analysis, vulnerability research, and license violation detection, but it appears incremental as it builds on existing graph neural network and binary analysis methods.

The paper tackles binary code similarity detection by proposing a graph neural network architecture with call graphlets, achieving comparable or state-of-the-art performance across cross-architecture, mono-architecture, and zero-shot tasks in evaluations on five datasets.

Binary code similarity detection is an important problem with applications in areas such as malware analysis, vulnerability research and license violation detection. This paper proposes a novel graph neural network architecture combined with a novel graph data representation called call graphlets. A call graphlet encodes the neighborhood around each function in a binary executable, capturing the local and global context through a series of statistical features. A specialized graph neural network model operates on this graph representation, learning to map it to a feature vector that encodes semantic binary code similarities using deep-metric learning. The proposed approach is evaluated across five distinct datasets covering different architectures, compiler tool chains, and optimization levels. Experimental results show that the combination of call graphlets and the novel graph neural network architecture achieves comparable or state-of-the-art performance compared to baseline techniques across cross-architecture, mono-architecture and zero shot tasks. In addition, our proposed approach also performs well when evaluated against an out-of-domain function inlining task. The work provides a general and effective graph neural network-based solution for conducting binary code similarity detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes