Urban-R1: Reinforced MLLMs Mitigate Geospatial Biases for Urban General Intelligence
This addresses the problem of regionally skewed predictions in AI systems for urban environments, offering a pathway toward more equitable urban intelligence, though it is incremental as it builds on existing MLLM methods.
The paper tackles geospatial bias in urban foundation models by proposing Urban-R1, a reinforcement learning-based post-training framework that aligns MLLMs with Urban General Intelligence objectives, resulting in improved cross-region generalization and outperformance of SFT-trained and closed-source models.
Rapid urbanization intensifies the demand for Urban General Intelligence (UGI), referring to AI systems that can understand and reason about complex urban environments. Recent studies have built urban foundation models using supervised fine-tuning (SFT) of LLMs and MLLMs, yet these models exhibit persistent geospatial bias, producing regionally skewed predictions and limited generalization. To this end, we propose Urban-R1, a reinforcement learning-based post-training framework that aligns MLLMs with the objectives of UGI. Urban-R1 adopts Group Relative Policy Optimization (GRPO) to optimize reasoning across geographic groups and employs urban region profiling as a proxy task to provide measurable rewards from multimodal urban data. Extensive experiments across diverse regions and tasks show that Urban-R1 effectively mitigates geo-bias and improves cross-region generalization, outperforming both SFT-trained and closed-source models. Our results highlight reinforcement learning alignment as a promising pathway toward equitable and trustworthy urban intelligence.