LGMay 23, 2024

Large language models can be zero-shot anomaly detectors for time series?

Sarah Alnegheimish, Linh Nguyen, Laure Berti-Equille, Kalyan Veeramachaneni

arXiv:2405.14755v324.146 citationsh-index: 36Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of anomaly detection in time series data for applications like monitoring and security, but it is incremental as it adapts existing LLM capabilities to a new domain without surpassing specialized models.

The paper tackled the problem of using large language models (LLMs) for zero-shot time series anomaly detection, introducing the sigllm framework with two methods, and found that while LLMs can detect anomalies, they underperform state-of-the-art deep learning models by 30%.

Recent studies have shown the ability of large language models to perform a variety of tasks, including time series forecasting. The flexible nature of these models allows them to be used for many applications. In this paper, we present a novel study of large language models used for the challenging task of time series anomaly detection. This problem entails two aspects novel for LLMs: the need for the model to identify part of the input sequence (or multiple parts) as anomalous; and the need for it to work with time series data rather than the traditional text input. We introduce sigllm, a framework for time series anomaly detection using large language models. Our framework includes a time-series-to-text conversion module, as well as end-to-end pipelines that prompt language models to perform time series anomaly detection. We investigate two paradigms for testing the abilities of large language models to perform the detection task. First, we present a prompt-based detection method that directly asks a language model to indicate which elements of the input are anomalies. Second, we leverage the forecasting capability of a large language model to guide the anomaly detection process. We evaluated our framework on 11 datasets spanning various sources and 10 pipelines. We show that the forecasting method significantly outperformed the prompting method in all 11 datasets with respect to the F1 score. Moreover, while large language models are capable of finding anomalies, state-of-the-art deep learning models are still superior in performance, achieving results 30% better than large language models.

View on arXiv PDF Code

Similar