CRLGFeb 27, 2017

eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys

arXiv:1702.08568v1221 citations
Originality Incremental advance
AI Analysis

This addresses the engineering burden in security systems by automating feature design for detecting malicious indicators, though it appears incremental as it applies deep learning to a specific domain.

The paper tackles the problem of automating feature extraction for security machine learning systems by proposing eXpose, a character-level convolutional neural network that processes raw short character strings like URLs and file paths. The result shows that eXpose outperforms manual feature extraction baselines, achieving a 5%-10% detection rate gain at a 0.1% false positive rate across intrusion detection problems.

For years security machine learning research has promised to obviate the need for signature based detection by automatically learning to detect indicators of attack. Unfortunately, this vision hasn't come to fruition: in fact, developing and maintaining today's security machine learning systems can require engineering resources that are comparable to that of signature-based detection systems, due in part to the need to develop and continuously tune the "features" these machine learning systems look at as attacks evolve. Deep learning, a subfield of machine learning, promises to change this by operating on raw input signals and automating the process of feature design and extraction. In this paper we propose the eXpose neural network, which uses a deep learning approach we have developed to take generic, raw short character strings as input (a common case for security inputs, which include artifacts like potentially malicious URLs, file paths, named pipes, named mutexes, and registry keys), and learns to simultaneously extract features and classify using character-level embeddings and convolutional neural network. In addition to completely automating the feature design and extraction process, eXpose outperforms manual feature extraction based baselines on all of the intrusion detection problems we tested it on, yielding a 5%-10% detection rate gain at 0.1% false positive rate compared to these baselines.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes