SENIDec 16, 2020

Summarizing Unstructured Logs in Online Services

arXiv:2012.08938v118 citationsHas Code
AI Analysis

This work addresses the inefficiency of manual or rule-based log summarization for operators managing large-scale online services.

This paper introduces LogSummary, an automatic and unsupervised framework for summarizing unstructured logs in online services. It extracts important log triples and ranks them using global knowledge, achieving an average ROUGE F1 score of 0.741 on four open-source log datasets.

Logs are one of the most valuable data sources for managing large-scale online services. After a failure is detected/diagnosed/predicted, operators still have to inspect the raw logs to gain a summarized view before take actions. However, manual or rule-based log summarization has become inefficient and ineffective. In this work, we propose LogSummary, an automatic, unsupervised end-to-end log summarization framework for online services. LogSummary obtains the summarized triples of important logs for a given log sequence. It integrates a novel information extraction method taking both semantic information and domain knowledge into consideration, with a new triple ranking approach using the global knowledge learned from all logs. Given the lack of a publicly-available gold standard for log summarization, we have manually labelled the summaries of four open-source log datasets and made them publicly available. The evaluation on these datasets as well as the case studies on real-world logs demonstrate that LogSummary produces a highly representative (average ROUGE F1 score of 0.741) summaries. We have packaged LogSummary into an open-source toolkit and hope that it can benefit for future NLP-powered summarization works.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes