syslrn: Learning What to Monitor for Efficient Anomaly Detection
This addresses the issue of high overhead in system monitoring for anomaly detection, particularly in cloud environments like OpenStack, though it is incremental as it builds on existing monitoring approaches.
The paper tackles the problem of efficient anomaly detection in system monitoring by introducing syslrn, which learns normal behavior offline to tailor online instrumentation, and shows in a case study on OpenStack failures that it outperforms state-of-the-art log-analysis systems with minimal overhead.
While monitoring system behavior to detect anomalies and failures is important, existing methods based on log-analysis can only be as good as the information contained in the logs, and other approaches that look at the OS-level software state introduce high overheads. We tackle the problem with syslrn, a system that first builds an understanding of a target system offline, and then tailors the online monitoring instrumentation based on the learned identifiers of normal behavior. While our syslrn prototype is still preliminary and lacks many features, we show in a case study for the monitoring of OpenStack failures that it can outperform state-of-the-art log-analysis systems with little overhead.