LGSep 1, 2021

Streaming data preprocessing via online tensor recovery for large environmental sensor networks

arXiv:2109.00596v116 citations
Originality Incremental advance
AI Analysis

This work addresses data cleaning challenges for large-scale environmental sensor networks, offering an incremental improvement with online processing and structured outlier detection.

The paper tackles the problem of preprocessing streaming high-dimensional urban environmental sensor data by proposing an online robust tensor recovery method, achieving a recovery error of 0.05 on a synthetic dataset and superior results on a real-world city-scale sensor network compared to existing methods.

Measuring the built and natural environment at a fine-grained scale is now possible with low-cost urban environmental sensor networks. However, fine-grained city-scale data analysis is complicated by tedious data cleaning including removing outliers and imputing missing data. While many methods exist to automatically correct anomalies and impute missing entries, challenges still exist on data with large spatial-temporal scales and shifting patterns. To address these challenges, we propose an online robust tensor recovery (OLRTR) method to preprocess streaming high-dimensional urban environmental datasets. A small-sized dictionary that captures the underlying patterns of the data is computed and constantly updated with new data. OLRTR enables online recovery for large-scale sensor networks that provide continuous data streams, with a lower computational memory usage compared to offline batch counterparts. In addition, we formulate the objective function so that OLRTR can detect structured outliers, such as faulty readings over a long period of time. We validate OLRTR on a synthetically degraded National Oceanic and Atmospheric Administration temperature dataset, with a recovery error of 0.05, and apply it to the Array of Things city-scale sensor network in Chicago, IL, showing superior results compared with several established online and batch-based low rank decomposition methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes