LGSTMLJul 5, 2015

Scan $B$-Statistic for Kernel Change-Point Detection

arXiv:1507.01279v5122 citations
Originality Incremental advance
AI Analysis

This work addresses change-point detection for statisticians and machine learning practitioners, offering a more efficient method for controlling false alarm rates in both offline and online scenarios.

The paper tackles the problem of detecting abrupt change-points in large background datasets by proposing two computationally efficient kernel-based statistics, which are shown to perform well on synthetic and real data with highly accurate tail probability approximations.

Detecting the emergence of an abrupt change-point is a classic problem in statistics and machine learning. Kernel-based nonparametric statistics have been used for this task which enjoy fewer assumptions on the distributions than the parametric approach and can handle high-dimensional data. In this paper we focus on the scenario when the amount of background data is large, and propose two related computationally efficient kernel-based statistics for change-point detection, which are inspired by the recently developed $B$-statistics. A novel theoretical result of the paper is the characterization of the tail probability of these statistics using the change-of-measure technique, which focuses on characterizing the tail of the detection statistics rather than obtaining its asymptotic distribution under the null distribution. Such approximations are crucial to control the false alarm rate, which corresponds to the significance level in offline change-point detection and the average-run-length in online change-point detection. Our approximations are shown to be highly accurate. Thus, they provide a convenient way to find detection thresholds for both offline and online cases without the need to resort to the more expensive simulations or bootstrapping. We show that our methods perform well on both synthetic data and real data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes