LG AIJun 3

Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate

Marc Pinet, Julien Cumin, Samuel Berlemont, Dominique Vaufreydaz

arXiv:2606.0267031.1h-index: 8

AI Analysis

For researchers in multivariate time series anomaly detection, this paper reveals that widely used benchmarks do not test cross-channel capabilities, calling for new evaluation sets.

The paper evaluates whether multivariate time series anomaly detection benchmarks contain anomalies that require cross-channel modeling. It finds that on eight benchmarks, no cross-channel rupture occurs without an accompanying univariate deviation, and on six benchmarks, at least half of anomaly segments deviate univariately on 89-100% of timesteps, concluding that current benchmarks are unsuitable for validating cross-channel modeling.

Many recent multivariate time series anomaly detection (MTSAD) models incorporate cross-channel modeling, under the implicit assumption that the structure of anomalies may be spread across multiple channels. We evaluate this assumption on eight widely used public benchmarks by introducing a per-segment diagnostic framework that flags, for each labeled anomaly, whether at least one channel deviates individually from its normal history, whether the cross-channel correlation structure changes, or both. The framework shows that no cross-channel rupture occurs without an accompanying univariate deviation across a range of reasonable thresholds. A complementary metric also reveals that on six of the eight benchmarks, at least half of the labeled anomaly segments deviate univariately on 89% to 100% of their timesteps, reaching 100% on three of these datasets. To verify that our framework captures cross-channel structure when present, we construct synthetic data of phase-shifted sinusoidal channels with shared noise. Each anomalous segment is altered through one of two channel-wise corruptions that preserve the per-channel marginal distribution while breaking cross-channel structure, and our framework correctly characterizes these segments as cross-channel-only. On these data, channel-dependent (CD) models successfully exploit the cross-channel signal whereas channel-independent (CI) ones fail. The CI/CD comparison of a recent SOTA detector on real benchmarks further confirms that CD modeling brings no measurable gain. We conclude that current MTSAD benchmarks are unsuitable for validating cross-channel modeling capabilities, and we call for the development of more structurally diverse evaluation sets. The code for this study is publicly available.

View on arXiv PDF

Similar