AIJun 29, 2023

Computationally Assisted Quality Control for Public Health Data Streams

arXiv:2306.16914v24 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses data quality issues for public health stakeholders, but it is incremental as it adapts existing outlier detection methods to a specific domain.

The paper tackled the problem of irregularities in public health data streams, such as COVID-19 cases, by developing FlaSH, a practical outlier detection framework that scales to high data volumes and matches or exceeds existing methods in accuracy, with human experts rating its identified outliers as more helpful.

Irregularities in public health data streams (like COVID-19 Cases) hamper data-driven decision-making for public health stakeholders. A real-time, computer-generated list of the most important, outlying data points from thousands of daily-updated public health data streams could assist an expert reviewer in identifying these irregularities. However, existing outlier detection frameworks perform poorly on this task because they do not account for the data volume or for the statistical properties of public health streams. Accordingly, we developed FlaSH (Flagging Streams in public Health), a practical outlier detection framework for public health data users that uses simple, scalable models to capture these statistical properties explicitly. In an experiment where human experts evaluate FlaSH and existing methods (including deep learning approaches), FlaSH scales to the data volume of this task, matches or exceeds these other methods in mean accuracy, and identifies the outlier points that users empirically rate as more helpful. Based on these results, FlaSH has been deployed on data streams used by public health stakeholders.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes