LGMEMLOct 31, 2022

Neural network-based CUSUM for online change-point detection

arXiv:2210.17312v63 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting distribution changes in sequential data for applications like anomaly detection, though it is an incremental improvement by integrating neural networks into an existing statistical framework.

The paper tackles the problem of online change-point detection in high-dimensional data by introducing a neural network-based CUSUM method, demonstrating strong performance with theoretical guarantees on metrics like average run length and expected detection delay.

Change-point detection, detecting an abrupt change in the data distribution from sequential data, is a fundamental problem in statistics and machine learning. CUSUM is a popular statistical method for online change-point detection due to its efficiency from recursive computation and constant memory requirement, and it enjoys statistical optimality. CUSUM requires knowing the precise pre- and post-change distribution. However, post-change distribution is usually unknown a priori since it represents anomaly and novelty. Classic CUSUM can perform poorly when there is a model mismatch with actual data. While likelihood ratio-based methods encounter challenges facing high dimensional data, neural networks have become an emerging tool for change-point detection with computational efficiency and scalability. In this paper, we introduce a neural network CUSUM (NN-CUSUM) for online change-point detection. We also present a general theoretical condition when the trained neural networks can perform change-point detection and what losses can achieve our goal. We further extend our analysis by combining it with the Neural Tangent Kernel theory to establish learning guarantees for the standard performance metrics, including the average run length (ARL) and expected detection delay (EDD). The strong performance of NN-CUSUM is demonstrated in detecting change-point in high-dimensional data using both synthetic and real-world data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes