Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models

arXiv:2605.0840415.2
AI Analysis

For smart city planners and researchers, this demonstrates the feasibility of combining remote sensing with LLMs for built environment analysis, though results are preliminary.

This work explores using large vision-language models (InternVL, Qwen) with remote sensing imagery for built environment reasoning tasks (design, constructability, landuse, risk). Results show potential for smart city decision-making.

This work investigates the use of large language models (LLMs) for tasks in smart cities. The core idea is to leverage remote sensing imagery to characterize the built environment, including design suggestions, constructability assessment, landuse patterns, and risk identification. We examine remote sensing imagery at multiple spatial scales as inputs for multimodal language modeling and evaluate their effects on built-environment-related reasoning. In addition, we compare state-of-the-art LLMs, including InternVL and Qwen, in terms of accuracy and reliability when generating built environment recommendations. The results demonstrate the potential of integrating remote sensing imagery with large language models to assist smart cities and decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes