A Bayesian Spatial Model to Correct Under-Reporting in Urban Crowdsourcing
This work addresses inequities in government services due to biased reporting, offering a method to correct under-reporting for more proactive and fair resource allocation.
The paper tackles the problem of under-reporting in urban crowdsourcing by developing a Bayesian spatial latent variable model to infer true event occurrences from incomplete reports, applied to storm-induced flooding in New York City. The model improves prediction accuracy and leads to more equitable inspections, better serving non-white, less educated, and lower-income residents.
Decision-makers often observe the occurrence of events through a reporting process. City governments, for example, rely on resident reports to find and then resolve urban infrastructural problems such as fallen street trees, flooded basements, or rat infestations. Without additional assumptions, there is no way to distinguish events that occur but are not reported from events that truly did not occur--a fundamental problem in settings with positive-unlabeled data. Because disparities in reporting rates correlate with resident demographics, addressing incidents only on the basis of reports leads to systematic neglect in neighborhoods that are less likely to report events. We show how to overcome this challenge by leveraging the fact that events are spatially correlated. Our framework uses a Bayesian spatial latent variable model to infer event occurrence probabilities and applies it to storm-induced flooding reports in New York City, further pooling results across multiple storms. We show that a model accounting for under-reporting and spatial correlation predicts future reports more accurately than other models, and further induces a more equitable set of inspections: its allocations better reflect the population and provide equitable service to non-white, less traditionally educated, and lower-income residents. This finding reflects heterogeneous reporting behavior learned by the model: reporting rates are higher in Census tracts with higher populations, proportions of white residents, and proportions of owner-occupied households. Our work lays the groundwork for more equitable proactive government services, even with disparate reporting behavior.