Documentation based Semantic-Aware Log Parsing
This addresses the need for more accurate log parsing in machine learning and data mining tasks, though it is incremental by incorporating documentation as an additional resource.
The paper tackles the problem of log parsing by leveraging software documentation to improve accuracy, achieving better parsing for both documented and undocumented messages and discovering event template linkages.
With the recent advances of deep learning techniques, there are rapidly growing interests in applying machine learning to log data. As a fundamental part of log analytics, accurate log parsing that transforms raw logs to structured events is critical for subsequent machine learning and data mining tasks. Previous approaches either analyze the source code for parsing or are data-driven such as text clustering. They largely neglect to exploit another widely available and valuable resource, software documentation that provides detailed explanations for the messages, to improve accuracy. In this paper, we propose an approach and system framework to use documentation knowledge for log parsing. With parameter value identification, it not only can improve the parsing accuracy for documented messages but also for undocumented messages. In addition, it can discover the linkages between event templates that are established by sharing parameters and indicate the correlation of the event context.