CRJun 10, 2017

Analysis of Anomalies in the Internet Traffic Observed at the Campus Network Gateway

arXiv:1706.03206v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of up-to-date, real-world data for intrusion detection systems, which is crucial for security administrators but is incremental as it focuses on data analysis rather than new methods.

The study tackled the problem of characterizing anomalies in internet traffic from a large academic network, analyzing 6.5 TB of data over 12 hours to report real-world anomalies and document issues that could lead to misinterpretations in intrusion detection research.

A considerable portion of the machine learning literature applied to intrusion detection uses outdated data sets based on a simulated network with a limited environment. Moreover, flaws usually appear in datasets and the way we handle them may impact on measurements. Finally, the detection capacity of intrusion detection is highly influenced by the system configuration. We focus on a topic rarely investigated: the characterization of anomalies in a large network environment. Intrusion Detection System (IDS) are used to detect exploits or other attacks that raise alarms. These anomalous events usually receive less attention than attack alarms, causing them to be frequently overlooked by security administrators. However, the observation of this activity contributes to understand the traffic network characteristics. On one hand, abnormal behaviors may be legitimate, e.g., misinterpreted protocols or malfunctioning network equipment, but on the other hand an attacker may intentionally craft packets to introduce anomalies to evade monitoring systems. Anomalies found in operational network environments may indicate cases of evasion attacks, application bugs, and a wide variety of factors that highly influence intrusion detection performance. This study explores the nature of anomalies found in U-Tokyo Network using cooperatively Bro and Snort IDS among other resources. We analyze 6.5 TB of compressed binary tcpdump data representing 12 hours of network traffic. Our major contributions can be summarized in: 1) reporting the anomalies observed in real, up-to-date traffic from a large academic network environment, and documenting problems in research that may lead to wrong results due to misinterpretations of data or misconfigurations in software; 2) assessing the quality of data by analyzing the potential and the real problems in the capture process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes