LG ST MLOct 14, 2016

Data-Driven Threshold Machine: Scan Statistics, Change-Point Detection, and Extreme Bandits

arXiv:1610.04599v11.0

Originality Incremental advance

AI Analysis

This work addresses a fundamental issue in learning tasks like scan statistics and change-point detection, offering a practical solution for scenarios with data dependence, though it appears incremental as it builds on extreme value theory.

The authors tackled the problem of selecting a threshold to bound tail probabilities of maximum values in dependent random sequences without distributional assumptions, achieving a distribution-free approach that is robust and computationally efficient, requiring only one sample path instead of many Monte Carlo samples.

We present a novel distribution-free approach, the data-driven threshold machine (DTM), for a fundamental problem at the core of many learning tasks: choose a threshold for a given pre-specified level that bounds the tail probability of the maximum of a (possibly dependent but stationary) random sequence. We do not assume data distribution, but rather relying on the asymptotic distribution of extremal values, and reduce the problem to estimate three parameters of the extreme value distributions and the extremal index. We specially take care of data dependence via estimating extremal index since in many settings, such as scan statistics, change-point detection, and extreme bandits, where dependence in the sequence of statistics can be significant. Key features of our DTM also include robustness and the computational efficiency, and it only requires one sample path to form a reliable estimate of the threshold, in contrast to the Monte Carlo sampling approach which requires drawing a large number of sample paths. We demonstrate the good performance of DTM via numerical examples in various dependent settings.

View on arXiv PDF

Similar