SEJun 12, 2018

A Directed Acyclic Graph Approach to Online Log Parsing

arXiv:1806.04356v134 citations
Originality Highly original
AI Analysis

This addresses the need for efficient, parameter-free online log parsing in distributed systems and other software, offering a practical solution for system reliability management.

The paper tackles the problem of parsing unstructured log messages in real-time for software system monitoring by proposing Drain, an online log parsing method based on directed acyclic graphs. The result shows that Drain achieves the highest accuracy on 11 real-world datasets and improves running time by 37.15% to 97.14% over state-of-the-art online parsers.

Logs are widely used in modern software system management because they are often the only data accessible that record system events at runtime. In recent years, because of the ever-increasing log size, data mining techniques are often utilized to help developers and operators conduct system reliability management. A typical log-based system reliability management procedure is to first parse log messages because of their unstructured format; and apply data mining techniques on the parsed logs to obtain critical system behavior information. Most of existing research studies focus on offline log parsing, which need to parse logs in batch mode. However, software systems, especially distributed systems, require online monitoring and maintenance. Thus, a log parser that can parse log messages in a streaming manner is highly in demand. To address this problem, we propose an online log parsing method, namely Drain, based on directed acyclic graph, which encodes specially designed rules for parsing. Drain can automatically generate a directed acyclic graph for a new system and update the graph according to the incoming log messages. Besides, Drain frees developers from the burden of parameter tuning by allowing them use Drain with no pre-defined parameters. To evaluate the performance of Drain, we collect 11 log datasets generated by real-world systems, ranging from distributed systems, Web applications, supercomputers, operating systems, to standalone software. The experimental results show that Drain has the highest accuracy on all 11 datasets. Moreover, Drain obtains 37.15\%$\sim$ 97.14\% improvement in the running time over the state-of-the-art online parsers. We also conduct a case study on a log-based anomaly detection task using Drain in the parsing step, which determines its effectiveness in system reliability management.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes