28.9LGApr 24
Beyond Land Surface Temperature: Explainable Spatial Machine Learning Reveals Urban Morphology Effects on Human-Centric Heat StressYuan Wang, Shengao Yi, Xiaojiang Li et al.
Heat exposure connects the built environment and public health, directly shaping the livability and sustainability of urban areas. Understanding the spatial heterogeneity of heat exposure and its drivers is vital for climate-adaptive urban planning. However, most planning-oriented studies rely on land surface temperature (LST), and whether LST adequately represents human heat exposure and how it differs from physiologically relevant heat stress remains insufficiently examined. Here, adopting Landsat-retrieved 30-m LST and GPU-accelerated 1-m universal thermal climate index (UTCI) in Singapore, this study establishes a comprehensive "Modeling-Comparing-Assessing" framework to systematically evaluate the spatial and mechanistic discrepancies between the two metrics. We further investigate pronounced non-stationary and threshold-based quantitative relationships of the two metrics with urban factors by employing a novel geographically weighted XGBoost (GW-XGBoost) and generalized additive model (GAM) workflow. Our results demonstrate notable discrepancies in spatial patterns of LST and UTCI, along with substantial spatial heterogeneity in how 2D and 3D urban factors impact these two thermal metrics, as revealed by explainable GW-XGBoost models (global out-of-bag R2 = 0.855 for LST and 0.905 for UTCI, respectively). Crucially, spatially explicit SHAP interprets that sky view factor plays a central role in explaining UTCI variability but exhibits a comparatively marginal independent contribution to LST, indicating that LST inadequately captures shading-driven and radiative processes governing actual human heat stress. Notably, SHAP-GAM analysis indicates that higher albedo is associated with increased UTCI. These novel findings provide evidence for integrating physiologically relevant thermal indices to inform targeted heat risk management and climate-adaptive urban planning.
CVApr 1, 2025Code
OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View ImageryXiucheng Liang, Jinheng Xie, Tianhong Zhao et al.
Building properties, such as height, usage, and material, play a crucial role in spatial data infrastructures, supporting various urban applications. Despite their importance, comprehensive building attribute data remain scarce in many urban areas. Recent advances have enabled the extraction of objective building attributes using remote sensing and street-level imagery. However, establishing a pipeline that integrates diverse open datasets, acquires holistic building imagery, and infers comprehensive building attributes at scale remains a significant challenge. Among the first, this study bridges the gaps by introducing OpenFACADES, an open framework that leverages multimodal crowdsourced data to enrich building profiles with both objective attributes and semantic descriptors through multimodal large language models. First, we integrate street-level image metadata from Mapillary with OpenStreetMap geometries via isovist analysis, identifying images that provide suitable vantage points for observing target buildings. Second, we automate the detection of building facades in panoramic imagery and tailor a reprojection approach to convert objects into holistic perspective views that approximate real-world observation. Third, we introduce an innovative approach that harnesses and investigates the capabilities of open-source large vision-language models (VLMs) for multi-attribute prediction and open-vocabulary captioning in building-level analytics, leveraging a globally sourced dataset of 31,180 labeled images from seven cities. Evaluation shows that fine-tuned VLM excel in multi-attribute inference, outperforming single-attribute computer vision models and zero-shot ChatGPT-4o. Further experiments confirm its superior generalization and robustness across culturally distinct region and varying image conditions.
CVFeb 9
UrbanGraphEmbeddings: Learning and Evaluating Spatially Grounded Multimodal Embeddings for Urban ScienceJie Zhang, Xingtong Yu, Yuan Fang et al.
Learning transferable multimodal embeddings for urban environments is challenging because urban understanding is inherently spatial, yet existing datasets and benchmarks lack explicit alignment between street-view images and urban structure. We introduce UGData, a spatially grounded dataset that anchors street-view images to structured spatial graphs and provides graph-aligned supervision via spatial reasoning paths and spatial context captions, exposing distance, directionality, connectivity, and neighborhood context beyond image content. Building on UGData, we propose UGE, a two-stage training strategy that progressively and stably aligns images, text, and spatial structures by combining instruction-guided contrastive learning with graph-based spatial encoding. We finally introduce UGBench, a comprehensive benchmark to evaluate how spatially grounded embeddings support diverse urban understanding tasks -- including geolocation ranking, image retrieval, urban perception, and spatial grounding. We develop UGE on multiple state-of-the-art VLM backbones, including Qwen2-VL, Qwen2.5-VL, Phi-3-Vision, and LLaVA1.6-Mistral, and train fixed-dimensional spatial embeddings with LoRA tuning. UGE built upon Qwen2.5-VL-7B backbone achieves up to 44% improvement in image retrieval and 30% in geolocation ranking on training cities, and over 30% and 22% gains respectively on held-out cities, demonstrating the effectiveness of explicit spatial grounding for spatially intensive urban tasks.
CVDec 22, 2025
From Pixels to Predicates Structuring urban perception with scene graphsYunlong Liu, Shuyang Li, Pengyuan Liu et al.
Perception research is increasingly modelled using streetscapes, yet many approaches still rely on pixel features or object co-occurrence statistics, overlooking the explicit relations that shape human perception. This study proposes a three stage pipeline that transforms street view imagery (SVI) into structured representations for predicting six perceptual indicators. In the first stage, each image is parsed using an open-set Panoptic Scene Graph model (OpenPSG) to extract object predicate object triplets. In the second stage, compact scene-level embeddings are learned through a heterogeneous graph autoencoder (GraphMAE). In the third stage, a neural network predicts perception scores from these embeddings. We evaluate the proposed approach against image-only baselines in terms of accuracy, precision, and cross-city generalization. Results indicate that (i) our approach improves perception prediction accuracy by an average of 26% over baseline models, and (ii) maintains strong generalization performance in cross-city prediction tasks. Additionally, the structured representation clarifies which relational patterns contribute to lower perception scores in urban scenes, such as graffiti on wall and car parked on sidewalk. Overall, this study demonstrates that graph-based structure provides expressive, generalizable, and interpretable signals for modelling urban perception, advancing human-centric and context-aware urban analytics.
LGJun 25, 2024Code
SE-VGAE: Unsupervised Disentangled Representation Learning for Interpretable Architectural Layout Design Graph GenerationJielin Chen, Rudi Stouffs
Despite the suitability of graphs for capturing the relational structures inherent in architectural layout designs, there is a notable dearth of research on interpreting architectural design space using graph-based representation learning and exploring architectural design graph generation. Concurrently, disentangled representation learning in graph generation faces challenges such as node permutation invariance and representation expressiveness. To address these challenges, we introduce an unsupervised disentangled representation learning framework, Style-based Edge-augmented Variational Graph Auto-Encoder (SE-VGAE), aiming to generate architectural layout in the form of attributed adjacency multi-graphs while prioritizing representation disentanglement. The framework is designed with three alternative pipelines, each integrating a transformer-based edge-augmented encoder, a latent space disentanglement module, and a style-based decoder. These components collectively facilitate the decomposition of latent factors influencing architectural layout graph generation, enhancing generation fidelity and diversity. We also provide insights into optimizing the framework by systematically exploring graph feature augmentation schemes and evaluating their effectiveness for disentangling architectural layout representation through extensive experiments. Additionally, we contribute a new benchmark large-scale architectural layout graph dataset extracted from real-world floor plan images to facilitate the exploration of graph data-based architectural design representation space interpretation. This study pioneered disentangled representation learning for the architectural layout graph generation. The code and dataset of this study will be open-sourced.
LGAug 15, 2025
From Heuristics to Data: Quantifying Site Planning Layout Indicators with Deep Learning and Multi-Modal DataQian Cao, Jielin Chen, Junchao Zhao et al.
The spatial layout of urban sites shapes land-use efficiency and spatial organization. Traditional site planning often relies on experiential judgment and single-source data, limiting systematic quantification of multifunctional layouts. We propose a Site Planning Layout Indicator (SPLI) system, a data-driven framework integrating empirical knowledge with heterogeneous multi-source data to produce structured urban spatial information. The SPLI supports multimodal spatial data systems for analytics, inference, and retrieval by combining OpenStreetMap (OSM), Points of Interest (POI), building morphology, land use, and satellite imagery. It extends conventional metrics through five dimensions: (1) Hierarchical Building Function Classification, refining empirical systems into clear hierarchies; (2) Spatial Organization, quantifying seven layout patterns (e.g., symmetrical, concentric, axial-oriented); (3) Functional Diversity, transforming qualitative assessments into measurable indicators using Functional Ratio (FR) and Simpson Index (SI); (4) Accessibility to Essential Services, integrating facility distribution and transport networks for comprehensive accessibility metrics; and (5) Land Use Intensity, using Floor Area Ratio (FAR) and Building Coverage Ratio (BCR) to assess utilization efficiency. Data gaps are addressed through deep learning, including Relational Graph Neural Networks (RGNN) and Graph Neural Networks (GNN). Experiments show the SPLI improves functional classification accuracy and provides a standardized basis for automated, data-driven urban spatial analytics.