DCLGDec 1, 2018

A Big Data Architecture for Log Data Storage and Analysis

arXiv:1812.00111v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses log data management and anomaly detection for large-scale intranet systems, but it appears incremental as it combines existing tools and methods without introducing major innovations.

The authors tackled the problem of analyzing database connection logs across a large intranet with over 10,000 users by proposing a big data architecture that uses Flume, Hadoop, ElasticSearch, and Kibana for storage and visualization, and machine learning models to predict anomalies from log data, though no concrete performance numbers are provided.

We propose an architecture for analysing database connection logs across different instances of databases within an intranet comprising over 10,000 users and associated devices. Our system uses Flume agents to send notifications to a Hadoop Distributed File System for long-term storage and ElasticSearch and Kibana for short-term visualisation, effectively creating a data lake for the extraction of log data. We adopt machine learning models with an ensemble of approaches to filter and process the indicators within the data and aim to predict anomalies or outliers using feature vectors built from this log data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes