Anomaly Detection for High-Dimensional Data Using Large Deviations Principle
This work addresses the challenge of scaling anomaly detection to high-dimensional data for applications like public health monitoring, though it appears incremental as it builds on existing large deviations theory.
The authors tackled the problem of anomaly detection in high-dimensional data, which often suffers from the curse of dimensionality, by proposing the Large Deviations Anomaly Detection (LAD) algorithm that outperforms state-of-the-art methods on benchmark datasets and applies it to identify anomalous counties in COVID-19 data.
Most current anomaly detection methods suffer from the curse of dimensionality when dealing with high-dimensional data. We propose an anomaly detection algorithm that can scale to high-dimensional data using concepts from the theory of large deviations. The proposed Large Deviations Anomaly Detection (LAD) algorithm is shown to outperform state of art anomaly detection methods on a variety of large and high-dimensional benchmark data sets. Exploiting the ability of the algorithm to scale to high-dimensional data, we propose an online anomaly detection method to identify anomalies in a collection of multivariate time series. We demonstrate the applicability of the online algorithm in identifying counties in the United States with anomalous trends in terms of COVID-19 related cases and deaths. Several of the identified anomalous counties correlate with counties with documented poor response to the COVID pandemic.