Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection
This work provides a significant improvement in memory efficiency and training speed for deep learning models processing extremely long sequences, which is crucial for cybersecurity applications like malware detection.
This paper addresses the challenge of classifying extremely long sequences, specifically in malware detection where inputs can reach 100 million steps. The authors developed a new temporal max pooling approach that makes memory usage constant with sequence length, improving MalConv's memory efficiency by 116x and training speed by up to 25.8x, while also introducing a Global Channel Gating design for efficient feature interaction across long sequences.
Recent works within machine learning have been tackling inputs of ever-increasing size, with cybersecurity presenting sequence classification problems of particularly extreme lengths. In the case of Windows executable malware detection, inputs may exceed $100$ MB, which corresponds to a time series with $T=100,000,000$ steps. To date, the closest approach to handling such a task is MalConv, a convolutional neural network capable of processing up to $T=2,000,000$ steps. The $\mathcal{O}(T)$ memory of CNNs has prevented further application of CNNs to malware. In this work, we develop a new approach to temporal max pooling that makes the required memory invariant to the sequence length $T$. This makes MalConv $116\times$ more memory efficient, and up to $25.8\times$ faster to train on its original dataset, while removing the input length restrictions to MalConv. We re-invest these gains into improving the MalConv architecture by developing a new Global Channel Gating design, giving us an attention mechanism capable of learning feature interactions across 100 million time steps in an efficient manner, a capability lacked by the original MalConv CNN. Our implementation can be found at https://github.com/NeuromorphicComputationResearchProgram/MalConv2