PLCRSEDec 11, 2020

On the Generation of Disassembly Ground Truth and the Evaluation of Disassemblers

arXiv:2012.09155v19 citationsHas Code
AI Analysis

This work addresses the problem of reliable disassembler benchmarking for software transformation and security tasks by providing standardized binaries and ground truth generation.

This paper introduces a disassembly benchmark suite of 879 binaries from diverse projects and a novel ground truth generator leveraging compiler listing files. They used this system to evaluate four open-source disassemblers.

When a software transformation or software security task needs to analyze a given program binary, the first step is often disassembly. Since many modern disassemblers have become highly accurate on many binaries, we believe reliable disassembler benchmarking requires standardizing the set of binaries used and the disassembly ground truth about these binaries. This paper presents (i) a first version of our work-in-progress disassembly benchmark suite, which comprises 879 binaries from diverse projects compiled with multiple compilers and optimization settings, and (ii) a novel disassembly ground truth generator leveraging the notion of "listing files", which has broad support by Clang, GCC, ICC, and MSVC. In additional, it presents our evaluation of four prominent open-source disassemblers using this benchmark suite and a custom evaluation system. Our entire system and all generated data are maintained openly on GitHub to encourage community adoption.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes