CRLGMLJul 10, 2019

On Designing Machine Learning Models for Malicious Network Traffic Classification

arXiv:1907.04846v114 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of public benchmarks and standardized methods for cybersecurity practitioners, offering incremental recommendations to improve ML-based malicious traffic classification.

The paper tackles the problem of botnet detection from network traffic data by providing guidelines for using supervised machine learning in cybersecurity, highlighting that feature representations should consider attack characteristics, ensemble models handle class imbalance well, and ground truth granularity is crucial.

Machine learning (ML) started to become widely deployed in cyber security settings for shortening the detection cycle of cyber attacks. To date, most ML-based systems are either proprietary or make specific choices of feature representations and machine learning models. The success of these techniques is difficult to assess as public benchmark datasets are currently unavailable. In this paper, we provide concrete guidelines and recommendations for using supervised ML in cyber security. As a case study, we consider the problem of botnet detection from network traffic data. Among our findings we highlight that: (1) feature representations should take into consideration attack characteristics; (2) ensemble models are well-suited to handle class imbalance; (3) the granularity of ground truth plays an important role in the success of these methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes