BinCtx: Multi-Modal Representation Learning for Robust Android App Behavior Detection
This addresses the challenge of identifying malicious or disruptive app behaviors for mobile security, representing an incremental improvement with strong specific gains.
The paper tackles the problem of detecting undesired behaviors in Android apps, which are hard to catch due to camouflage and lack of permission-protected APIs, by proposing BINCTX, a multi-modal representation learning approach that achieves a macro F1 of 94.73% on real-world data, outperforming baselines by at least 14.92% and showing robustness under obfuscation.
Mobile app markets host millions of apps, yet undesired behaviors (e.g., disruptive ads, illegal redirection, payment deception) remain hard to catch because they often do not rely on permission-protected APIs and can be easily camouflaged via UI or metadata edits. We present BINCTX, a learning approach that builds multi-modal representations of an app from (i) a global bytecode-as-image view that captures code-level semantics and family-style patterns, (ii) a contextual view (manifested actions, components, declared permissions, URL/IP constants) indicating how behaviors are triggered, and (iii) a third-party-library usage view summarizing invocation frequencies along inter-component call paths. The three views are embedded and fused to train a contextual-aware classifier. On real-world malware and benign apps, BINCTX attains a macro F1 of 94.73%, outperforming strong baselines by at least 14.92%. It remains robust under commercial obfuscation (F1 84% post-obfuscation) and is more resistant to adversarial samples than state-of-the-art bytecode-only systems.