MLLGSep 12, 2016

Online Data Thinning via Multi-Subspace Tracking

arXiv:1609.03544v111 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of data overload for analysts in domains like imagery and e-mail, though it is incremental as it builds on existing subspace tracking and anomaly detection techniques.

The paper tackles the problem of processing large-scale streaming data by proposing an online data thinning method that preserves unique or anomalous elements for expert analysis, using dynamic low-rank Gaussian mixture models to achieve scalability and real-time operation.

In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in datacenters. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low-rank Gaussian mixture models. Specifically, the high-dimensional covariances matrices associated with the Gaussian components are associated with low-rank models. According to this model, most observations lie near a union of subspaces. The low-rank modeling mitigates the curse of dimensionality associated with anomaly detection for high-dimensional data, and recent advances in subspace clustering and subspace tracking allow the proposed method to adapt to dynamic environments. Furthermore, the proposed method allows subsampling, is robust to missing data, and uses a mini-batch online optimization approach. The resulting algorithms are scalable, efficient, and are capable of operating in real time. Experiments on wide-area motion imagery and e-mail databases illustrate the efficacy of the proposed approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes