SEAICRAug 24, 2023

kTrans: Knowledge-Aware Transformer for Binary Code Embedding

arXiv:2308.12659v119 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving reverse engineering tasks like binary code similarity detection for security analysts, but it is incremental as it builds on existing Transformer-based methods.

The paper tackled the problem of binary code embedding by incorporating prior knowledge of assembly language into a Transformer model, resulting in state-of-the-art performance improvements of 5.2%, 6.8%, and 12.6% on three downstream tasks.

Binary Code Embedding (BCE) has important applications in various reverse engineering tasks such as binary code similarity detection, type recovery, control-flow recovery and data-flow analysis. Recent studies have shown that the Transformer model can comprehend the semantics of binary code to support downstream tasks. However, existing models overlooked the prior knowledge of assembly language. In this paper, we propose a novel Transformer-based approach, namely kTrans, to generate knowledge-aware binary code embedding. By feeding explicit knowledge as additional inputs to the Transformer, and fusing implicit knowledge with a novel pre-training task, kTrans provides a new perspective to incorporating domain knowledge into a Transformer framework. We inspect the generated embeddings with outlier detection and visualization, and also apply kTrans to 3 downstream tasks: Binary Code Similarity Detection (BCSD), Function Type Recovery (FTR) and Indirect Call Recognition (ICR). Evaluation results show that kTrans can generate high-quality binary code embeddings, and outperforms state-of-the-art (SOTA) approaches on downstream tasks by 5.2%, 6.8%, and 12.6% respectively. kTrans is publicly available at: https://github.com/Learner0x5a/kTrans-release

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes