Anomaly Detection in Hierarchical Data Streams under Unknown Models
This work addresses anomaly detection in hierarchical data streams for applications like heavy hitter detection and adaptive sampling, presenting an incremental improvement with theoretical guarantees.
The paper tackles the problem of detecting a few targets in hierarchical data streams with unknown, heavy-tailed distributions by proposing an active inference strategy that minimizes sample complexity under reliability constraints, establishing its order optimality in terms of search space size and reliability.
We consider the problem of detecting a few targets among a large number of hierarchical data streams. The data streams are modeled as random processes with unknown and potentially heavy-tailed distributions. The objective is an active inference strategy that determines, sequentially, which data stream to collect samples from in order to minimize the sample complexity under a reliability constraint. We propose an active inference strategy that induces a biased random walk on the tree-structured hierarchy based on confidence bounds of sample statistics. We then establish its order optimality in terms of both the size of the search space (i.e., the number of data streams) and the reliability requirement. The results find applications in hierarchical heavy hitter detection, noisy group testing, and adaptive sampling for active learning, classification, and stochastic root finding.