Sketching Multidimensional Time Series for Fast Discord Mining
This work addresses a scalability bottleneck for data analysts in domains like water treatment and transportation, enabling real-time exploration of 'what-if' scenarios, though it is incremental as it builds on existing matrix profile techniques.
The paper tackles the problem of high computational complexity in matrix profile computation for multidimensional time series discord mining by proposing a sketching method, achieving at least 50X throughput improvement with minimal impact on solution quality.
Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with dimensionality of a multidimensional time series. In this work, we propose a sketch for discord mining among multi-dimensional time series. After an initial pre-processing of the sketch as fast as reading the data, the discord mining has runtime independent of the dimensionality of the original data. On several real world examples from water treatment and transportation, the proposed algorithm improves the throughput by at least an order of magnitude (50X) and only has minimal impact on the quality of the approximated solution. Additionally, the proposed method can handle the dynamic addition or deletion of dimensions inconsequential overhead. This allows a data analyst to consider "what-if" scenarios in real time while exploring the data.