CRAIIRLGOct 5, 2023

FASER: Binary Code Similarity Search through the use of Intermediate Representations

arXiv:2310.03605v38 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of identifying functions in cross-architecture software for malware analysis, supply chain security, and vulnerability research, representing an incremental improvement.

The paper tackles cross-architecture binary code similarity search by proposing FASER, which uses intermediate representations and transformers to achieve strong performance, outperforming all baselines in general function and vulnerability search tasks.

Being able to identify functions of interest in cross-architecture software is useful whether you are analysing for malware, securing the software supply chain or conducting vulnerability research. Cross-Architecture Binary Code Similarity Search has been explored in numerous studies and has used a wide range of different data sources to achieve its goals. The data sources typically used draw on common structures derived from binaries such as function control flow graphs or binary level call graphs, the output of the disassembly process or the outputs of a dynamic analysis approach. One data source which has received less attention is binary intermediate representations. Binary Intermediate representations possess two interesting properties: they are cross architecture by their very nature and encode the semantics of a function explicitly to support downstream usage. Within this paper we propose Function as a String Encoded Representation (FASER) which combines long document transformers with the use of intermediate representations to create a model capable of cross architecture function search without the need for manual feature engineering, pre-training or a dynamic analysis step. We compare our approach against a series of baseline approaches for two tasks; A general function search task and a targeted vulnerability search task. Our approach demonstrates strong performance across both tasks, performing better than all baseline approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes