SEFeb 18, 2022
Pinpointing Anomaly Events in Logs from Stability Testing -- N-Grams vs. Deep-LearningMika Mäntylä, Martín Varela, Shayan Hashemi
As stability testing execution logs can be very long, software engineers need help in locating anomalous events. We develop and evaluate two models for scoring individual log-events for anomalousness, namely an N-Gram model and a Deep Learning model with LSTM (Long short-term memory). Both are trained on normal log sequences only. We evaluate the models with long log sequences of Android stability testing in our company case and with short log sequences from HDFS (Hadoop Distributed File System) public dataset. We evaluate next event prediction accuracy and computational efficiency. The LSTM model is more accurate in stability testing logs (0.848 vs 0.865), whereas in HDFS logs the N-Gram is slightly more accurate (0.904 vs 0.900). The N-Gram model has far superior computational efficiency compared to the Deep model (4 to 13 seconds vs 16 minutes to nearly 4 hours), making it the preferred choice for our case company. Scoring individual log events for anomalousness seems like a good aid for root cause analysis of failing test cases, and our case company plans to add it to its online services. Despite the recent surge in using deep learning in software system anomaly detection, we found limited benefits in doing so. However, future work should consider whether our finding holds with different LSTM-model hyper-parameters, other datasets, and with other deep-learning approaches that promise better accuracy and computational efficiency than LSTM based models.
SEApr 15, 2021
OneLog: Towards End-to-End Training in Software Log Anomaly DetectionShayan Hashemi, Mika Mäntylä
With the growth of online services, IoT devices, and DevOps-oriented software development, software log anomaly detection is becoming increasingly important. Prior works mainly follow a traditional four-staged architecture (Preprocessor, Parser, Vectorizer, and Classifier). This paper proposes OneLog, which utilizes a single Deep Neural Network (DNN) instead of multiple separate components. OneLog harnesses Convolutional Neural Networks (CNN) at the character level to take digits, numbers, and punctuations, which were removed in prior works, into account alongside the main natural language text. We evaluate our approach in six message- and sequence-based data sets: HDFS, Hadoop, BGL, Thunderbird, Spirit, and Liberty. We experiment with Onelog with single-, multi-, and cross-project setups. Onelog offers state-of-the-art performance in our datasets. Onelog can utilize multi-project datasets simultaneously during training, which suggests our model can generalize between datasets. Multi-project training also improves Onelog performance making it ideal when limited training data is available for an individual project. We also found that cross-project anomaly detection is possible with a single project pair (Liberty and Spirit). Analysis of model internals shows that one log has multiple modes of detecting anomalies and that the model learns manually validated parsing rules for the log messages. We conclude that character-based CNNs are a promising approach toward end-to-end learning in log anomaly detection. They offer good performance and generalization over multiple datasets. We will make our scripts publicly available upon the acceptance of this paper.
SEFeb 2, 2021
Detecting Anomalies in Software Execution Logs with Siamese NetworkShayan Hashemi, Mika Mäntylä
Logs are semi-structured text files that represent software's execution paths and states during its run-time. Therefore, detecting anomalies in software logs reflect anomalies in the software's execution path or state. So, it has become a notable concern in software engineering. We use LSTM like many prior works, and on top of LSTM, we propose a novel anomaly detection approach based on the Siamese network. This paper also provides an authentic validation of the approach on the Hadoop Distributed File System (HDFS) log dataset. To the best of our knowledge, the proposed approach outperforms other methods on the same dataset at the F1 score of 0.996, resulting in a new state-of-the-art performance on the dataset. Along with the primary method, we introduce a novel training pair generation algorithm that reduces generated training pairs by the factor of 3000 while maintaining the F1 score, merely a modest decay from 0.996 to 0.995. Additionally, we propose a hybrid model by combining the Siamese network with a traditional feedforward neural network to make end-to-end training possible, reducing engineering effort in setting up a deep-learning-based log anomaly detector. Furthermore, we examine our method's robustness to log evolutions by evaluating the model on synthetically evolved log sequences; we got the F1 score of 0.95 at the noise ratio of 20%. Finally, we dive deep into some of the side benefits of the Siamese network. Accordingly, we introduce a method of monitoring the evolutions of logs without label requirements at run-time. Additionally, we present a visualization technique that facilitates human administrations of log anomaly detection.