LGCRApr 29, 2020

Packet2Vec: Utilizing Word2Vec for Feature Extraction in Packet Data

arXiv:2004.14477v132 citations
AI Analysis

This addresses the problem of reducing human effort in feature engineering for network security, though it is incremental as it applies an existing method to a new domain.

The paper tackled automatic feature extraction from raw network packet data for intrusion detection by adapting Word2Vec, achieving AUC-ROC scores of 0.988-0.996 and AUC-PR scores of 0.604-0.667 on a DARPA dataset.

One of deep learning's attractive benefits is the ability to automatically extract relevant features for a target problem from largely raw data, instead of utilizing human engineered and error prone handcrafted features. While deep learning has shown success in fields such as image classification and natural language processing, its application for feature extraction on raw network packet data for intrusion detection is largely unexplored. In this paper we modify a Word2Vec approach, used for text processing, and apply it to packet data for automatic feature extraction. We call this approach Packet2Vec. For the classification task of benign versus malicious traffic on a 2009 DARPA network data set, we obtain an area under the curve (AUC) of the receiver operating characteristic (ROC) between 0.988-0.996 and an AUC of the Precision/Recall curve between 0.604-0.667.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes