CRPLDec 10, 2021

BCD: A Cross-Architecture Binary Comparison Database Experiment Using Locality Sensitive Hashing Algorithms

arXiv:2112.05492v1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for reverse engineers by providing a tool to quickly gain context on unknown binary functions, but it is incremental as it builds on existing hashing algorithms.

The paper tackled the problem of identifying functions in binary executables without source code by comparing different hashing functions for detecting similar LLVM IR snippets, resulting in a cross-architecture binary code similarity search database using MinHash, which achieved unspecified effectiveness compared to SimHash, SSDEEP, and TLSH.

Given a binary executable without source code, it is difficult to determine what each function in the binary does by reverse engineering it, and even harder without prior experience and context. In this paper, we performed a comparison of different hashing functions' effectiveness at detecting similar lifted snippets of LLVM IR code, and present the design and implementation of a framework for cross-architecture binary code similarity search database using MinHash as the chosen hashing algorithm, over SimHash, SSDEEP and TLSH. The motivation is to help reverse engineers to quickly gain context of functions in an unknown binary by comparing it against a database of known functions. The code for this project is open source and can be found at https://github.com/h4sh5/bcddb

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes