LGOct 13, 2025

A Comprehensive Forecasting-Based Framework for Time Series Anomaly Detection: Benchmarking on the Numenta Anomaly Benchmark (NAB)

Mohammad Karami, Mostafa Jalali, Fatemeh Ghassemi

arXiv:2510.11141v1h-index: 1

Originality Incremental advance

AI Analysis

This work addresses the problem of inconsistent evaluation in anomaly detection for digital infrastructure, providing evidence-based guidance for practitioners, though it is incremental as it unifies existing methods rather than introducing a new paradigm.

The paper tackled the lack of systematic cross-domain evaluation in time series anomaly detection by presenting a comprehensive forecasting-based framework, benchmarking it on the Numenta Anomaly Benchmark with results showing LSTM achieving the best performance (F1: 0.688) and Informer providing competitive accuracy (F1: 0.683) with faster training.

Time series anomaly detection is critical for modern digital infrastructures, yet existing methods lack systematic cross-domain evaluation. We present a comprehensive forecasting-based framework unifying classical methods (Holt-Winters, SARIMA) with deep learning architectures (LSTM, Informer) under a common residual-based detection interface. Our modular pipeline integrates preprocessing (normalization, STL decomposition), four forecasting models, four detection methods, and dual evaluation through forecasting metrics (MAE, RMSE, PCC) and detection metrics (Precision, Recall, F1, AUC). We conduct the first complete evaluation on the Numenta Anomaly Benchmark (58 datasets, 7 categories) with 232 model training runs and 464 detection evaluations achieving 100\% success rate. LSTM achieves best performance (F1: 0.688, ranking first or second on 81\% of datasets) with exceptional correlation on complex patterns (PCC: 0.999). Informer provides competitive accuracy (F1: 0.683) with 30\% faster training. Classical methods achieve perfect predictions on simple synthetic data with 60 lower cost but show 2-3 worse F1-scores on real-world datasets. Forecasting quality dominates detection performance: differences between detection methods (F1: 0.621-0.688) are smaller than between forecasting models (F1: 0.344-0.688). Our findings provide evidence-based guidance: use LSTM for complex patterns, Informer for efficiency-critical deployments, and classical methods for simple periodic data with resource constraints. The complete implementation and results establish baselines for future forecasting-based anomaly detection research.

View on arXiv PDF

Similar