Packet2Vec: Utilizing Word2Vec for Feature Extraction in Packet Data
This addresses the problem of reducing human effort in feature engineering for network security, though it is incremental as it applies an existing method to a new domain.
The paper tackled automatic feature extraction from raw network packet data for intrusion detection by adapting Word2Vec, achieving AUC-ROC scores of 0.988-0.996 and AUC-PR scores of 0.604-0.667 on a DARPA dataset.
One of deep learning's attractive benefits is the ability to automatically extract relevant features for a target problem from largely raw data, instead of utilizing human engineered and error prone handcrafted features. While deep learning has shown success in fields such as image classification and natural language processing, its application for feature extraction on raw network packet data for intrusion detection is largely unexplored. In this paper we modify a Word2Vec approach, used for text processing, and apply it to packet data for automatic feature extraction. We call this approach Packet2Vec. For the classification task of benign versus malicious traffic on a 2009 DARPA network data set, we obtain an area under the curve (AUC) of the receiver operating characteristic (ROC) between 0.988-0.996 and an AUC of the Precision/Recall curve between 0.604-0.667.