LLM Agent-Assisted Reverse Engineering with Quantitative Readability Metrics
For reverse engineers and security analysts, this work addresses the bottleneck of unreadable decompiled code, but the results are incremental as the method is a composite of existing metrics and the evaluation is limited to qualitative discussion.
The authors tackle the problem of improving the readability of decompiled C code using LLM agents. They propose the Quantitative Readability Score (QRS) framework, which combines structural similarity with three readability sub-metrics, and show that QRS-guided refinement enables targeted readability improvements without sacrificing correctness.
Automatic decompilers produce functionally correct but often unreadable C code. This paper addresses one stage of the reverse engineering workflow: improving the readability of decompiled code using LLM agents guided by quantitative metrics. We present a three-phase research evolution. Phase 1 (tool-driven steering via Ghidra MCP) suffered from incomplete coverage and inconsistent improvements due to lack of quantitative guidance. Phase 2 (structural similarity validation alone) revealed that agents optimize for metrics in unintended ways, producing structurally equivalent but less readable code. Our contribution is the Quantitative Readability Score (QRS) framework, a composite metric combining a structural similarity gate with three independent readability sub-metrics (Lexical Surprisal, Structural Simplicity, and Idiomatic Quality). We demonstrate that QRS-guided refinement enables LLM agents to make targeted readability improvements without sacrificing correctness. We provide a discussion of the broader reverse engineering workflow (binary lifting, decompilation cleanup, and achieving functional equivalence) as context, however, it remains out of scope.