CR AIMay 25, 2019

Transferable Cost-Aware Security Policy Implementation for Malware Detection Using Deep Reinforcement Learning

Yoni Birman, Shaked Hindi, Gilad Katz, Asaf Shabtai

arXiv:1905.10517v210.95 citations

Originality Incremental advance

AI Analysis

This addresses cost and efficiency challenges for organizations using malware detection ensembles, offering a transferable and adaptable solution, though it is incremental as it builds on existing reinforcement learning and ensemble methods.

The study tackled the computational cost of ensemble malware detection by proposing SPIREL, a reinforcement learning method that dynamically assigns detectors and thresholds per file, achieving either superior accuracy with modest efficiency gains or an ~80% reduction in runtime with only a 0.5% drop in accuracy and F1-score.

Malware detection is an ever-present challenge for all organizational gatekeepers, who must maintain high detection rates while minimizing interruptions to the organization's workflow. To improve detection rates, organizations often deploy an ensemble of detectors. While effective, this approach is computationally expensive, since every file - even clear-cut cases - needs to be analyzed by all detectors. Moreover, with an ever-increasing number of files to process, the use of ensembles may incur unacceptable processing times and costs (e.g., cloud resources). In this study, we propose SPIREL, a reinforcement learning-based method for cost-effective malware detection. Our method enables organizations to directly associate costs to correct/incorrect classification, computing resources and run-time, and then dynamically establishes a security policy. This security policy is then implemented, and for each inspected file, a different set of detectors is assigned and a different detection threshold is set. Our evaluation on two malware domains- Portable Executable (PE) and Android Application Package (APK)files - shows that SPIREL is both accurate and extremely resource-efficient: the proposed method either outperforms the best performing baselines while achieving a modest improvement in efficiency, or reduces the required running time by ~80% while decreasing the accuracy and F1-score by only 0.5%. We also show that our approach is both highly transferable across different datasets and adaptable to changes in individual detector performance.

View on arXiv PDF

Similar