On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data
This addresses the challenge for data scientists and engineers in selecting appropriate anomaly detection models for real-time applications, though it is incremental as it synthesizes existing techniques rather than introducing new ones.
The paper tackles the problem of selecting anomaly detection techniques for real-time streaming data by analyzing their runtime and accuracy trade-offs, providing a guide for choosing the best method based on production datasets from various domains.
Ever growing volume and velocity of data coupled with decreasing attention span of end users underscore the critical need for real-time analytics. In this regard, anomaly detection plays a key role as an application as well as a means to verify data fidelity. Although the subject of anomaly detection has been researched for over 100 years in a multitude of disciplines such as, but not limited to, astronomy, statistics, manufacturing, econometrics, marketing, most of the existing techniques cannot be used as is on real-time data streams. Further, the lack of characterization of performance -- both with respect to real-timeliness and accuracy -- on production data sets makes model selection very challenging. To this end, we present an in-depth analysis, geared towards real-time streaming data, of anomaly detection techniques. Given the requirements with respect to real-timeliness and accuracy, the analysis presented in this paper should serve as a guide for selection of the "best" anomaly detection technique. To the best of our knowledge, this is the first characterization of anomaly detection techniques proposed in very diverse set of fields, using production data sets corresponding to a wide set of application domains.