Gaussian Process Subset Scanning for Anomalous Pattern Detection in Non-iid Data
This addresses the challenge of identifying emerging irregularities in correlated data for applications such as public health monitoring and disaster response, representing an incremental advance over existing subset scanning techniques.
The paper tackled the problem of detecting subtle anomalous patterns in non-iid data by introducing methods that combine Gaussian processes with novel log-likelihood ratio statistics and subset scanning, achieving improved detection power as demonstrated on simulations and real-world datasets like opioid overdose deaths and storm reports.
Identifying anomalous patterns in real-world data is essential for understanding where, when, and how systems deviate from their expected dynamics. Yet methods that separately consider the anomalousness of each individual data point have low detection power for subtle, emerging irregularities. Additionally, recent detection techniques based on subset scanning make strong independence assumptions and suffer degraded performance in correlated data. We introduce methods for identifying anomalous patterns in non-iid data by combining Gaussian processes with novel log-likelihood ratio statistic and subset scanning techniques. Our approaches are powerful, interpretable, and can integrate information across multiple data streams. We illustrate their performance on numeric simulations and three open source spatiotemporal datasets of opioid overdose deaths, 311 calls, and storm reports.