MoniLog: An Automated Log-Based Anomaly Detection System for Cloud Computing Infrastructures
This addresses the need for scalable monitoring in cloud companies and large online platforms to prevent service failures, though it appears incremental as it builds on existing log-based detection methods.
The paper tackles the problem of real-time anomaly detection in cloud computing infrastructures by introducing MoniLog, a distributed system that detects sequential and quantitative anomalies in multi-source log streams, with results including automated labeling and criticality evaluation based on administrator actions.
Within today's large-scale systems, one anomaly can impact millions of users. Detecting such events in real-time is essential to maintain the quality of services. It allows the monitoring team to prevent or diminish the impact of a failure. Logs are a core part of software development and maintenance, by recording detailed information at runtime. Such log data are universally available in nearly all computer systems. They enable developers as well as system maintainers to monitor and dissect anomalous events. For Cloud computing companies and large online platforms in general, growth is linked to the scaling potential. Automatizing the anomaly detection process is a promising way to ensure the scalability of monitoring capacities regarding the increasing volume of logs generated by modern systems. In this paper, we will introduce MoniLog, a distributed approach to detect real-time anomalies within large-scale environments. It aims to detect sequential and quantitative anomalies within a multi-source log stream. MoniLog is designed to structure a log stream and perform the monitoring of anomalous sequences. Its output classifier learns from the administrator's actions to label and evaluate the criticality level of anomalies.