Graph-Based Prediction Models for Data Debiasing
It addresses data bias problems in critical domains like healthcare and public safety, offering a method for more reliable decision-making, though it appears incremental as it builds on graph-based and optimization techniques.
The paper tackles bias from under- and over-reporting in data by introducing GROUD, a graph-based optimization framework that estimates true incident counts and bias probabilities, showing robust and superior performance in recovering debiased counts on simulated and real-world datasets like Atlanta emergency calls and COVID-19 vaccine reports.
Bias in data collection, arising from both under-reporting and over-reporting, poses significant challenges in critical applications such as healthcare and public safety. In this work, we introduce Graph-based Over- and Under-reporting Debiasing (GROUD), a novel graph-based optimization framework that debiases reported data by jointly estimating the true incident counts and the associated reporting bias probabilities. By modeling the bias as a smooth signal over a graph constructed from geophysical or feature-based similarities, our convex formulation not only ensures a unique solution but also comes with theoretical recovery guarantees under certain assumptions. We validate GROUD on both challenging simulated experiments and real-world datasets -- including Atlanta emergency calls and COVID-19 vaccine adverse event reports -- demonstrating its robustness and superior performance in accurately recovering debiased counts. This approach paves the way for more reliable downstream decision-making in systems affected by reporting irregularities.