LGMay 23, 2024

Large language models can be zero-shot anomaly detectors for time series?

arXiv:2405.14755v346 citationsh-index: 36
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of anomaly detection in time series data for applications like monitoring and security, but it is incremental as it adapts existing LLM capabilities to a new domain without surpassing specialized models.

The paper tackled the problem of using large language models (LLMs) for zero-shot time series anomaly detection, introducing the sigllm framework with two methods, and found that while LLMs can detect anomalies, they underperform state-of-the-art deep learning models by 30%.

Recent studies have shown the ability of large language models to perform a variety of tasks, including time series forecasting. The flexible nature of these models allows them to be used for many applications. In this paper, we present a novel study of large language models used for the challenging task of time series anomaly detection. This problem entails two aspects novel for LLMs: the need for the model to identify part of the input sequence (or multiple parts) as anomalous; and the need for it to work with time series data rather than the traditional text input. We introduce sigllm, a framework for time series anomaly detection using large language models. Our framework includes a time-series-to-text conversion module, as well as end-to-end pipelines that prompt language models to perform time series anomaly detection. We investigate two paradigms for testing the abilities of large language models to perform the detection task. First, we present a prompt-based detection method that directly asks a language model to indicate which elements of the input are anomalies. Second, we leverage the forecasting capability of a large language model to guide the anomaly detection process. We evaluated our framework on 11 datasets spanning various sources and 10 pipelines. We show that the forecasting method significantly outperformed the prompting method in all 11 datasets with respect to the F1 score. Moreover, while large language models are capable of finding anomalies, state-of-the-art deep learning models are still superior in performance, achieving results 30% better than large language models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes