LGJun 10, 2025

Urban Incident Prediction with Graph Neural Networks: Integrating Government Ratings and Crowdsourced Reports

Sidhika Balachandar, Shuvom Sadhuka, Bonnie Berger, Emma Pierson, Nikhil Garg

arXiv:2506.08740v24.11 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of accurate urban incident forecasting for government officials, offering a method to handle heterogeneous, sparse, and biased data, though it is incremental as it builds on existing GNN approaches.

The paper tackles the problem of predicting urban incidents like potholes or rodent issues by integrating sparse government inspection ratings and dense but biased crowdsourced reports, using a multiview, multioutput GNN-based model that improves latent state prediction accuracy, especially when rating data is sparse and reports are predictive, as demonstrated on a New York City dataset with over 9.6 million reports and 1 million ratings.

Graph neural networks (GNNs) are widely used in urban spatiotemporal forecasting, such as predicting infrastructure problems. In this setting, government officials wish to know in which neighborhoods incidents like potholes or rodent issues occur. The true state of incidents (e.g., street conditions) for each neighborhood is observed via government inspection ratings. However, these ratings are only conducted for a sparse set of neighborhoods and incident types. We also observe the state of incidents via crowdsourced reports, which are more densely observed but may be biased due to heterogeneous reporting behavior. First, for such settings, we propose a multiview, multioutput GNN-based model that uses both unbiased rating data and biased reporting data to predict the true latent state of incidents. Second, we investigate a case study of New York City urban incidents and collect, standardize, and make publicly available a dataset of 9,615,863 crowdsourced reports and 1,041,415 government inspection ratings over 3 years and across 139 types of incidents. Finally, we show on both real and semi-synthetic data that our model can better predict the latent state compared to models that use only reporting data or models that use only rating data, especially when rating data is sparse and reports are predictive of ratings. We also quantify demographic biases in crowdsourced reporting, e.g., higher-income neighborhoods report problems at higher rates. Our analysis showcases a widely applicable approach for latent state prediction using heterogeneous, sparse, and biased data.

View on arXiv PDF Code

Similar