Eun-Sun Cho

CR
h-index9
3papers
1citation
Novelty57%
AI Score49

3 Papers

CRMay 29
A Core-Structure-Based Automated Analysis Tool for Commercial Virtualization Obfuscation Deobfuscation

Wanju Kim, Seoksu Lee, Eun-Sun Cho

Virtualization obfuscation is a more powerful obfuscation technique compared to other obfuscation methods, and as it is increasingly being applied to malware, it demands significant effort and time from analysts. This study analyzes virtualization obfuscation and proposes a tool called VMPredator that automatically extracts semantic units. The proposed tool performs various analyses including memory analysis and trace analysis, while minimizing dependency on the specific internal structure of virtual machines in order to handle diverse forms of virtualization obfuscation that existing tools are unable to process. Experimental results demonstrate that the length of obfuscated programs was reduced by approximately 85%, and it was verified through validation that small-scale programs were fully restored to semantics identical to the original.

CRMay 11
Towards LLM-Based Analysis of Virtualization-Obfuscated Code through Automated Data Generation

Sangjun An, Hyeyeon Park, Yejin Son et al.

Virtualization-based obfuscation produces extremely large and structurally complex binaries, posing challenges for LLM-based analysis due to input size limits and the need for large-scale labeled data. We address this by focusing on structural rather than full semantic analysis. Obfuscated binaries are decomposed into the largest semantically coherent units that fit within LLM constraints and are labeled according to their structural roles. We implement a static analysis framework to automate labeling and enable large-scale dataset generation. Our prototype shows strong performance on real-world virtualization obfuscators.

CRJun 30, 2025
gMBA: Expression Semantic Guided Mixed Boolean-Arithmetic Deobfuscation Using Transformer Architectures

Youjeong Noh, Joon-Young Paik, Jingun Kwon et al.

Mixed Boolean-Arithmetic (MBA) obfuscation protects intellectual property by converting programs into forms that are more complex to analyze. However, MBA has been increasingly exploited by malware developers to evade detection and cause significant real-world problems. Traditional MBA deobfuscation methods often consider these expressions as part of a black box and overlook their internal semantic information. To bridge this gap, we propose a truth table, which is an automatically constructed semantic representation of an expression's behavior that does not rely on external resources. The truth table is a mathematical form that represents the output of expression for all possible combinations of input. We also propose a general and extensible guided MBA deobfuscation framework (gMBA) that modifies a Transformer-based neural encoder-decoder Seq2Seq architecture to incorporate this semantic guidance. Experimental results and in-depth analysis show that integrating expression semantics significantly improves performance and highlights the importance of internal semantic expressions in recovering obfuscated code to its original form.