CLDec 2, 2024Code
Adapting Large Language Models to Log Analysis with Interpretable Domain KnowledgeYuhe Ji, Yilun Liu, Feiyu Yao et al.
Log analysis represents a critical sub-domain within AI applications that facilitates automatic approaches to fault and error management of large-scaled software systems, saving labors of traditional manual methods. While existing solutions using large language models (LLMs) show promise, they are limited by a significant domain gap between natural and log languages (the latter contains rich domain-specific tokens such as status codes, IP addresses, resource pathes), which restricts their effectiveness in real-world applications. However, directly adapting general-purpose LLMs to log analysis using raw logs may degrade their performance due to inconsistent token distribution. In this paper, we present a domain adaptation approach that addresses these limitations by integrating interpretable domain knowledge into open-source LLMs through continual pre-training (CPT), which bridges this domain gap by adapting LLMs on interpretable natural texts with log knowledge (instead of raw logs) to reduce distribution discrepancy. To achieve this, we developed NLPLog, a comprehensive dataset containing over 250,000 question-answer pairs on log-related knowledge. Our resulting model, SuperLog, achieves the best performance across four log analysis tasks, with an average accuracy improvement of 12.01% over the second-best model. Ablation study also suggests advantages of domain adaption using interpretable log knowledge over using raw logs.
LGOct 9, 2018
Improvement of K Mean Clustering Algorithm Based on DensitySu Chang, Xu Zhenzong, Gao Xuan
The purpose of this paper is to improve the traditional K-means algorithm. In the traditional K mean clustering algorithm, the initial clustering centers are generated randomly in the data set. It is easy to fall into the local minimum solution when the initial cluster centers are randomly generated. The initial clustering center selected by K-means clustering algorithm which based on density is more representative. The experimental results show that the improved K clustering algorithm can eliminate the dependence on the initial cluster, and the accuracy of clustering is improved.
CLOct 9, 2018
Fake Comment Detection Based on Sentiment AnalysisSu Chang, Xu Zhenzhong, Gao Xuan
With the development of the E-commerce and reviews website, the comment information is influencing people's life. More and more users share their consumption experience and evaluate the quality of commodity by comment. When people make a decision, they will refer these comments. The dependency of the comments make the fake comment appear. The fake comment is that for profit and other bad motivation, business fabricate untrue consumption experience and they preach or slander some products. The fake comment is easy to mislead users' opinion and decision. The accuracy of humans identifying fake comment is low. It's meaningful to detect fake comment using natural language processing technology for people getting true comment information. This paper uses the sentimental analysis to detect fake comment.