CRLGJul 17, 2019

Dynamic Malware Analysis with Feature Engineering and Feature Learning

arXiv:1907.07352v5154 citations
Originality Highly original
AI Analysis

This work addresses malware detection for cybersecurity, offering a more efficient and accurate solution compared to existing methods that rely on complex feature engineering.

The paper tackles the problem of dynamic malware detection by proposing a novel feature extraction method using feature hashing and a deep neural network with Gated-CNNs and bidirectional LSTM, achieving significant performance improvements over baselines on a large real dataset.

Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g. system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released ("zero-day") malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex feature engineering operations and expert knowledge to process the arguments. In this paper, we propose a novel and low-cost feature extraction approach, and an effective deep neural network architecture for accurate and fast malware detection. Specifically, the feature representation approach utilizes a feature hashing trick to encode the API call arguments associated with the API name. The deep neural network architecture applies multiple Gated-CNNs (convolutional neural networks) to transform the extracted features of each API call. The outputs are further processed through bidirectional LSTM (long-short term memory networks) to learn the sequential correlation among API calls. Experiments show that our solution outperforms baselines significantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from the ablation study.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes