Zion Leonahenahe Basque

CR
h-index30
3papers
2citations
Novelty63%
AI Score50

3 Papers

CRMay 5
Root-Cause-Driven Automated Vulnerability Repair

Hulin Wang, Zion Leonahenahe Basque, Jie Hu et al.

Recent LLM-based systems have made automated vulnerability repair increasingly practical, but two challenges remain. First, without strong signals about where a bug originates, repair agents drift toward shallow edits that silence the observed failure while leaving the underlying defect unresolved. Second, finding the root cause for bugs is hard: even developers familiar with the codebase frequently produce fixes that address symptoms rather than the root cause, and LLM-based agents, operating with noisier context and less program understanding, are no exception. We present Kumushi, a root-cause-driven patching agent that addresses both challenges by combining diversified dynamic fault localization with evidence-weighted ranking to focus the LLM on the code most relevant to the defect. To rigorously measure whether Kumushi produces genuinely better patches, we also introduce a two-tier patch quality metric that pairs automated oracle validation with structured expert assessment of patches. Evaluated on 178 C/C++ vulnerabilities, Kumushi substantially outperforms prior specialized repair agents under automated evaluation while matching a frontier commercial coding agent. Expert assessment then reveals differences that oracles cannot: Kumushi produces more root-cause fixes and fewer superficial patches, and is preferred in the majority of decisive pairwise comparisons. Together, these results demonstrate that progress in automated vulnerability repair requires not only stronger patching systems, but also richer evaluation methods capable of distinguishing genuine fixes from oracle-passing ones.

CRMar 18
Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries

Ashwin Sudhir, Zion Leonahenahe Basque, Wil Gibbs et al.

In the ever-evolving battle against malware, binary obfuscation techniques are a formidable barrier to effective analysis by both human security analysts and automated systems. In particular, virtualization or VM-based obfuscation is one of the strongest protection mechanisms that evade automated analysis. Despite widespread use of virtualization, existing automated deobfuscation techniques suffer from three major drawbacks. First, they only work on execution traces, which prevents them from recovering all logic in an obfuscated binary. Second, they depend on dynamic symbolic execution, which is expensive and does not scale in practice. Third, they cannot generate "well-formed" code, which prevents existing binary decompilers from generating human-friendly output. This paper introduces PUSHAN, a novel and generic technique for deobfuscating virtualization-obfuscated binaries while overcoming the limitations of existing techniques. PUSHAN is trace-free and avoids path-constraint accumulation by using VPC-sensitive, constraint-free symbolic emulation to recover a complete CFG of the virtualized function. It is the first approach that also decompiles the protected code into high-quality C pseudocode to enable effective analysis. Crucially, PUSHAN circumvents reliance on path satisfiability, a known NP-hard problem that hampers scalability. We evaluate PUSHAN on more than 1,000 binaries, including targets protected by academic state of the art (Tigress) and commercial-strength obfuscators VMProtect and Themida. PUSHAN successfully deobfuscates these binaries, retrieves their complete CFGs, and decompiles them to C pseudocode. We further demonstrate applicability by analyzing a previously unanalyzed VMProtect-obfuscated malware sample from VirusTotal, where our decompiled output enables LLM-assisted code simplification, reuse, and program understanding.

SESep 27, 2025Code
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software

Zehua Zhang, Ati Priya Bajaj, Divij Handa et al.

Automatically compiling open-source software (OSS) projects is a vital, labor-intensive, and complex task, which makes it a good challenge for LLM Agents. Existing methods rely on manually curated rules and workflows, which cannot adapt to OSS that requires customized configuration or environment setup. Recent attempts using Large Language Models (LLMs) used selective evaluation on a subset of highly rated OSS, a practice that underestimates the realistic challenges of OSS compilation. In practice, compilation instructions are often absent, dependencies are undocumented, and successful builds may even require patching source files or modifying build scripts. We propose a more challenging and realistic benchmark, BUILD-BENCH, comprising OSS that are more diverse in quality, scale, and characteristics. Furthermore, we propose a strong baseline LLM-based agent, OSS-BUILD-AGENT, an effective system with enhanced build instruction retrieval module that achieves state-of-the-art performance on BUILD-BENCH and is adaptable to heterogeneous OSS characteristics. We also provide detailed analysis regarding different compilation method design choices and their influence to the whole task, offering insights to guide future advances. We believe performance on BUILD-BENCH can faithfully reflect an agent's ability to tackle compilation as a complex software engineering tasks, and, as such, our benchmark will spur innovation with a significant impact on downstream applications in the fields of software development and software security.