Anomaly Detection for Network Connection Logs
This work addresses anomaly detection for network administrators managing large-scale infrastructures, but it appears incremental as it combines existing tools and methods.
The paper tackled the problem of detecting anomalies in untagged, unfiltered network connection logs by developing a streaming architecture using ELK, Spark, and Hadoop for near real-time analysis, achieving a system capable of handling thousands of nodes and hundreds of log lines per second.
We leverage a streaming architecture based on ELK, Spark and Hadoop in order to collect, store, and analyse database connection logs in near real-time. The proposed system investigates outliers using unsupervised learning; widely adopted clustering and classification algorithms for log data, highlighting the subtle variances in each model by visualisation of outliers. Arriving at a novel solution to evaluate untagged, unfiltered connection logs, we propose an approach that can be extrapolated to a generalised system of analysing connection logs across a large infrastructure comprising thousands of individual nodes and generating hundreds of lines in logs per second.