Zahratu Shabrina

7.0AIJul 16

AI vs Human Expert Reasoning: Assessing Agreements in Building Typology Predictions based on Street View Imagery

Zahratu Shabrina, Muhammad Asa, Jin Rui et al.

This research investigates the potential of Vision-Language Models (VLMs) to infer building typologies: Construction, Current Use, and Storeys from Google Street View (GSV) images. Predictions generated by VLMs are compared with inference by human experts (civil engineers and architects) as a source of manually labelled ground-truth data. We evaluate several state-of-the-art VLMs, including GPT-4o, Claude 3.5 Sonnet, and Gemini 2.0 Flash. By applying different scaling strategies and prompting techniques, we found that Chain-of-Thought prompts provide an overall more stable model performance. We also investigate the reasoning behind VLMs' building-typology predictions by examining the probabilities of keywords appearing in AI explanations. This enabled us to analyse patterns in these reasonings and identify key themes driving both agreements and disagreements between VLM and expert labels. We find that AI tends to focus on visual indicators, whereas human experts place greater emphasis on broader contextual cues and domain knowledge, in addition to visual cues. Overall, VLM can approximate experts' capability in building-typology classification at scale, with an average accuracy of approximately 70%. The study demonstrates the VLM's potential for AI automation in tasks that require pattern recognition and object identification in an urban context. AI have the potential to serve as complementary and collaborative tools for urban analysis, leveraging their strengths in understanding visual patterns. This study contributes to the exploration of the efficiency and scalability of AI visual prediction and provides insights into the reasoning processes that could support automation processes in urban analysis and prediction.

4.1LGMar 27, 2025

Advancing Spatiotemporal Prediction using Artificial Intelligence: Extending the Framework of Geographically and Temporally Weighted Neural Network (GTWNN) for Differing Geographical and Temporal Contexts

Nicholas Robert Fisk, Matthew Ng Kok Ming, Zahratu Shabrina

This paper aims at improving predictive crime models by extending the mathematical framework of Artificial Neural Networks (ANNs) tailored to general spatiotemporal problems and appropriately applying them. Recent advancements in the geospatial-temporal modelling field have focused on the inclusion of geographical weighting in their deep learning models to account for nonspatial stationarity, which is often apparent in spatial data. We formulate a novel semi-analytical approach to solving Geographically and Temporally Weighted Regression (GTWR), and applying it to London crime data. The results produce high-accuracy predictive evaluation scores that affirm the validity of the assumptions and approximations in the approach. This paper presents mathematical advances to the Geographically and Temporally Weighted Neural Network (GTWNN) framework, which offers a novel contribution to the field. Insights from past literature are harmoniously employed with the assumptions and approximations to generate three mathematical extensions to GTWNN's framework. Combinations of these extensions produce five novel ANNs, applied to the London and Detroit datasets. The results suggest that one of the extensions is redundant and is generally surpassed by another extension, which we term the history-dependent module. The remaining extensions form three novel ANN designs that pose potential GTWNN improvements. We evaluated the efficacy of various models in both the London and Detroit crime datasets, highlighting the importance of accounting for specific geographic and temporal characteristics when selecting modelling strategies to improve model suitability. In general, the proposed methods provide the foundations for a more context-aware, accurate, and robust ANN approach in spatio-temporal modelling.

Zahratu Shabrina

2 Papers