Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis
This provides a static and generic deobfuscation tool for software security analysts, though it is incremental as it builds on existing techniques.
The paper tackles the problem of detecting and removing opaque predicates in obfuscated code by combining binary analysis with machine learning, achieving up to 98% accuracy in experiments.
We present a new approach that bridges binary analysis techniques with machine learning classification for the purpose of providing a static and generic evaluation technique for opaque predicates, regardless of their constructions. We use this technique as a static automated deobfuscation tool to remove the opaque predicates introduced by obfuscation mechanisms. According to our experimental results, our models have up to 98% accuracy at detecting and deob-fuscating state-of-the-art opaque predicates patterns. By contrast, the leading edge deobfuscation methods based on symbolic execution show less accuracy mostly due to the SMT solvers constraints and the lack of scalability of dynamic symbolic analyses. Our approach underlines the efficiency of hybrid symbolic analysis and machine learning techniques for a static and generic deobfuscation methodology.