From Pixels to Urban Policy-Intelligence: Recovering Legacy Effects of Redlining with a Multimodal LLM
This provides a tool for urban policymakers to track place-based interventions, though it is incremental as it applies an existing method to a new domain.
The paper tackled the problem of measuring neighborhood characteristics for urban policy evaluation by using a multimodal LLM to infer poverty and tree canopy from street-view imagery, achieving estimates statistically indistinguishable from authoritative sources and outperforming conventional pixel-based methods.
This paper shows how a multimodal large language model (MLLM) can expand urban measurement capacity and support tracking of place-based policy interventions. Using a structured, reason-then-estimate pipeline on street-view imagery, GPT-4o infers neighborhood poverty and tree canopy, which we embed in a quasi-experimental design evaluating the legacy of 1930s redlining. GPT-4o recovers the expected adverse socio-environmental legacy effects of redlining, with estimates statistically indistinguishable from authoritative sources, and it outperforms a conventional pixel-based segmentation baseline-consistent with the idea that holistic scene reasoning extracts higher-order information beyond object counts alone. These results position MLLMs as policy-grade instruments for neighborhood measurement and motivate broader validation across policy-evaluation settings.