CR AI LGApr 1, 2019

ScriptNet: Neural Static Analysis for Malicious JavaScript Detection

Jack W. Stokes, Rakshit Agrawal, Geoff McDonald, Matthew Hausknecht

arXiv:1904.01126v18.318 citations

Originality Incremental advance

AI Analysis

This addresses the threat of malicious scripts for web security, but it is incremental as it builds on existing neural methods with a new model variant.

The paper tackles the problem of detecting malicious JavaScript files using static analysis for web-scale processing, achieving a 97.20% true positive rate at a 0.50% false positive rate on a dataset of 212,408 files.

Malicious scripts are an important computer infection threat vector in the wild. For web-scale processing, static analysis offers substantial computing efficiencies. We propose the ScriptNet system for neural malicious JavaScript detection which is based on static analysis. We use the Convoluted Partitioning of Long Sequences (CPoLS) model, which processes Javascript files as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our model variants are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating this model on a large corpus of 212,408 JavaScript files indicates that the best performing CPoLS model offers a 97.20% true positive rate (TPR) for the first 60K byte subsequence at a false positive rate (FPR) of 0.50%. The best performing CPoLS model significantly outperform several baseline models.

View on arXiv PDF

Similar