AIAPJun 20, 2012

Probabilistic Models for Anomaly Detection in Remote Sensor Data Streams

arXiv:1206.5250v138 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for automated data cleaning in ecological monitoring, though it is incremental as it applies an existing method to a specific domain.

The paper tackles the problem of cleaning remote sensor data streams by introducing a Dynamic Bayesian Network model to distinguish sensor failures from valid air temperature data, achieving precision and recall comparable to a domain expert.

Remote sensors are becoming the standard for observing and recording ecological data in the field. Such sensors can record data at fine temporal resolutions, and they can operate under extreme conditions prohibitive to human access. Unfortunately, sensor data streams exhibit many kinds of errors ranging from corrupt communications to partial or total sensor failures. This means that the raw data stream must be cleaned before it can be used by domain scientists. In our application environment|the H.J. Andrews Experimental Forest|this data cleaning is performed manually. This paper introduces a Dynamic Bayesian Network model for analyzing sensor observations and distinguishing sensor failures from valid data for the case of air temperature measured at 15 minute time resolution. The model combines an accurate distribution of long-term and short-term temperature variations with a single generalized fault model. Experiments with historical data show that the precision and recall of the method is comparable to that of the domain expert. The system is currently being deployed to perform real-time automated data cleaning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes