CRLGMLDec 29, 2017

A Deep Belief Network Based Machine Learning System for Risky Host Detection

arXiv:1801.00025v111 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need to reduce false positives in security operations centers, though it is incremental as it applies known machine learning methods to a specific domain.

The paper tackles the problem of overwhelming false positive alerts in enterprise cybersecurity by proposing a Deep Belief Network system for risky host detection, which performs six times better than existing rule-based systems.

To assure cyber security of an enterprise, typically SIEM (Security Information and Event Management) system is in place to normalize security event from different preventive technologies and flag alerts. Analysts in the security operation center (SOC) investigate the alerts to decide if it is truly malicious or not. However, generally the number of alerts is overwhelming with majority of them being false positive and exceeding the SOC's capacity to handle all alerts. There is a great need to reduce the false positive rate as much as possible. While most previous research focused on network intrusion detection, we focus on risk detection and propose an intelligent Deep Belief Network machine learning system. The system leverages alert information, various security logs and analysts' investigation results in a real enterprise environment to flag hosts that have high likelihood of being compromised. Text mining and graph based method are used to generate targets and create features for machine learning. In the experiment, Deep Belief Network is compared with other machine learning algorithms, including multi-layer neural network, random forest, support vector machine and logistic regression. Results on real enterprise data indicate that the deep belief network machine learning system performs better than other algorithms for our problem and is six times more effective than current rule-based system. We also implement the whole system from data collection, label creation, feature engineering to host score generation in a real enterprise production environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes