LGCLMLDec 17, 2019

In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks

arXiv:1912.07946v34 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge for reverse engineers by providing an automated tool to assign human-like names to assembly code, though it appears incremental as it applies existing NLP methods like Seq2Seq and Transformers to a new domain.

The paper tackles the problem of automatically naming assembly functions in stripped binaries, achieving promising results that beat the state-of-the-art with good performance, as tested on a corpus of nearly 9 million functions from over 22k software programs.

In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and precisely define the framework in which our investigation takes place. That is we define the problem, we provide reasonable justifications for the choices that we made for the design of training and the tests. We performed an analysis on a large real-world corpora constituted by nearly 9 millions of functions taken from more than 22k softwares. In such framework we test baselines coming from the field of Natural Language Processing (e.g., Seq2Seq networks and Transformer). Interestingly, our evaluation shows promising results beating the state-of-the-art and reaching good performance. We investigate the applicability of tine-tuning (i.e., taking a model already trained on a large generic corpora and retraining it for a specific task). Such technique is popular and well-known in the NLP field. Our results confirm that fine-tuning is effective even when neural networks are applied to binaries. We show that a model, pre-trained on the aforementioned corpora, when fine-tuned has higher performances on specific domains (such as predicting names in system utilites, malware, etc).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes