James Haworth

CV
h-index5
16papers
250citations
Novelty44%
AI Score52

16 Papers

LGSep 10, 2023
Uncertainty-Aware Probabilistic Graph Neural Networks for Road-Level Traffic Accident Prediction

Xiaowei Gao, Xinke Jiang, Dingyi Zhuang et al.

Traffic accidents present substantial challenges to human safety and socio-economic development in urban areas. Developing a reliable and responsible traffic accident prediction model is crucial to addressing growing public safety concerns and enhancing the safety of urban mobility systems. Traditional methods face limitations at fine spatiotemporal scales due to the sporadic nature of highrisk accidents and the predominance of non-accident characteristics. Furthermore, while most current models show promising occurrence prediction, they overlook the uncertainties arising from the inherent nature of accidents, and then fail to adequately map the hierarchical ranking of accident risk values for more precise insights. To address these issues, we introduce the Spatiotemporal Zero-Inflated Tweedie Graph Neural Network STZITDGNN -- the first uncertainty-aware probabilistic graph deep learning model in roadlevel traffic accident prediction for multisteps. This model integrates the interpretability of the statistical Tweedie family model and the expressive power of graph neural networks. Its decoder innovatively employs a compound Tweedie model,a Poisson distribution to model the frequency of accident occurrences and a Gamma distribution to assess injury severity, supplemented by a zeroinflated component to effectively identify exessive nonincident instances. Empirical tests using realworld traffic data from London, UK, demonstrate that the STZITDGNN surpasses other baseline models across multiple benchmarks and metrics, including accident risk value prediction, uncertainty minimisation, non-accident road identification and accident occurrence accuracy. Our study demonstrates that STZTIDGNN can effectively inform targeted road monitoring, thereby improving urban road safety strategies.

CVAug 20, 2024Code
V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?

Natchapon Jongwiriyanurak, Zichao Zeng, June Moh Goo et al.

Road safety assessments are critical yet costly, especially in Low- and Middle-Income Countries (LMICs), where most roads remain unrated. Traditional methods require expert annotation and training data, while supervised learning-based approaches struggle to generalise across regions. In this paper, we introduce \textit{V-RoAst}, a zero-shot Visual Question Answering (VQA) framework using Vision-Language Models (VLMs) to classify road safety attributes defined by the iRAP standard. We introduce the first open-source dataset from ThaiRAP, consisting of over 2,000 curated street-level images from Thailand annotated for this task. We evaluate Gemini-1.5-flash and GPT-4o-mini on this dataset and benchmark their performance against VGGNet and ResNet baselines. While VLMs underperform on spatial awareness, they generalise well to unseen classes and offer flexible prompt-based reasoning without retraining. Our results show that VLMs can serve as automatic road assessment tools when integrated with complementary data. This work is the first to explore VLMs for zero-shot infrastructure risk assessment and opens new directions for automatic, low-cost road safety mapping. Code and dataset: https://github.com/PongNJ/V-RoAst.

LGJul 24, 2024
SMA-Hyper: Spatiotemporal Multi-View Fusion Hypergraph Learning for Traffic Accident Prediction

Xiaowei Gao, James Haworth, Ilya Ilyankou et al.

Predicting traffic accidents is the key to sustainable city management, which requires effective address of the dynamic and complex spatiotemporal characteristics of cities. Current data-driven models often struggle with data sparsity and typically overlook the integration of diverse urban data sources and the high-order dependencies within them. Additionally, they frequently rely on predefined topologies or weights, limiting their adaptability in spatiotemporal predictions. To address these issues, we introduce the Spatiotemporal Multiview Adaptive HyperGraph Learning (SMA-Hyper) model, a dynamic deep learning framework designed for traffic accident prediction. Building on previous research, this innovative model incorporates dual adaptive spatiotemporal graph learning mechanisms that enable high-order cross-regional learning through hypergraphs and dynamic adaptation to evolving urban data. It also utilises contrastive learning to enhance global and local data representations in sparse datasets and employs an advance attention mechanism to fuse multiple views of accident data and urban functional features, thereby enriching the contextual understanding of risk factors. Extensive testing on the London traffic accident dataset demonstrates that the SMA-Hyper model significantly outperforms baseline models across various temporal horizons and multistep outputs, affirming the effectiveness of its multiview fusion and adaptive learning strategies. The interpretability of the results further underscores its potential to improve urban traffic management and safety by leveraging complex spatiotemporal urban data, offering a scalable framework adaptable to diverse urban environments.

CVJul 21, 2024
Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis

Jingwei Guo, Yitai Cheng, Meihui Wang et al.

Cyclists face a disproportionate risk of injury, yet conventional crash records are too sparse to identify risk factors at fine spatial and temporal scales. Recently, naturalistic studies have used video data to capture the complex behavioural and infrastructural risk factors. A promising format is panoramic video, which can record 360$^\circ$ views around a rider. However, its use is limited by distortions, large numbers of small objects, and boundary continuity, which cannot be handled using existing computer vision models. This research proposes a novel three-step framework: (1) enhancing object detection accuracy on panoramic imagery by segmenting and projecting the original 360$^\circ$ images into sub-images; (2) modifying multi-object tracking models to incorporate boundary continuity and object category information; and (3) validating through a real-world application of vehicle overtaking detection. The methodology is evaluated using panoramic videos recorded by cyclists on London's roadways under diverse conditions. Experimental results demonstrate improvements over baselines, achieving higher average precision across varying image resolutions. Moreover, the enhanced tracking approach yields a 10.0% decrease in identification switches and a 2.7% improvement in identification precision. The overtaking detection task achieves a high F-score of 0.82, illustrating the practical effectiveness of the proposed method in real-world cycling safety scenarios.

CVJun 13, 2025Code
CLIP the Landscape: Automated Tagging of Crowdsourced Landscape Images

Ilya Ilyankou, Natchapon Jongwiriyanurak, Tao Cheng et al.

We present a CLIP-based, multi-modal, multi-label classifier for predicting geographical context tags from landscape photos in the Geograph dataset--a crowdsourced image archive spanning the British Isles, including remote regions lacking POIs and street-level imagery. Our approach addresses a Kaggle competition\footnote{https://www.kaggle.com/competitions/predict-geographic-context-from-landscape-photos} task based on a subset of Geograph's 8M images, with strict evaluation: exact match accuracy is required across 49 possible tags. We show that combining location and title embeddings with image features improves accuracy over using image embeddings alone. We release a lightweight pipeline\footnote{https://github.com/SpaceTimeLab/ClipTheLandscape} that trains on a modest laptop, using pre-trained CLIP image and text embeddings and a simple classification head. Predicted tags can support downstream tasks such as building location embedders for GeoAI applications, enriching spatial understanding in data-sparse regions.

CLMay 17, 2024Code
CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl

Ilya Ilyankou, Meihui Wang, Stefano Cavazzi et al.

The Common Crawl (CC) corpus is the largest open web crawl dataset containing 9.5+ petabytes of data captured since 2008. The dataset is instrumental in training large language models, and as such it has been studied for (un)desirable content, and distilled for smaller, domain-specific datasets. However, to our knowledge, no research has been dedicated to using CC as a source of annotated geospatial data. In this paper, we introduce an efficient pipeline to extract annotated user-generated tracks from GPX files found in CC, and the resulting multimodal dataset with 1,416 pairings of human-written descriptions and MultiLineString vector data from the 6 most recent CC releases. The dataset can be used to study people's outdoor activity patterns, the way people talk about their outdoor experiences, as well as for developing trajectory generation or track annotation models, or for various other problems in place of synthetically generated routes. Our reproducible code is available on GitHub: https://github.com/ilyankou/cc-gpx

DBSep 27, 2024
CycleTrajectory: An End-to-End Pipeline for Enriching and Analyzing GPS Trajectories to Understand Cycling Behavior and Environment

Meihui Wang, James Haworth, Ilya Ilyankou et al.

Global positioning system (GPS) trajectories recorded by mobile phones or action cameras offer valuable insights into sustainable mobility, as they provide fine-scale spatial and temporal characteristics of individual travel. However, the high volume, noise, and lack of semantic information in this data poses challenges for storage, analysis, and applications. To address these issues, we propose an end-to-end pipeline named CycleTrajectory for processing high-sampling rate GPS trajectory data from action cameras, leveraging OpenStreetMap (OSM) for semantic enrichment. The methodology includes (1) Data Preparation, which includes filtration, noise removal, and resampling; (2) Map Matching, which accurately aligns GPS points with road segments using the OSRM API; (3) OSM Data integration to enrich trajectories with road infrastructure details; and (4) Variable Calculation to derive metrics like distance, speed, and infrastructure usage. Validation of the map matching results shows an error rate of 5.64%, indicating the reliability of this pipeline. This approach enhances efficient GPS data preparation and facilitates a deeper understanding of cycling behavior and the cycling environment.

IRMay 11
Much of Geospatial Web Search Is Beyond Traditional GIS

Ilya Ilyankou, Stefano Cavazzi, James Haworth

Web search queries concern place far more often than existing labelling schemes suggest, yet the landscape of geospatial web search queries - what people ask of place, and how often - remains poorly characterised at scale. We apply dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior filtering for toponyms or spatial keywords, identifying 181,827 geospatial queries (18.0%), nearly threefold the 6.17% labelled as Location in the original annotations. The resulting taxonomy of 88 query categories reveals that geospatial web search is dominated by transactional and practical lookups: costs and prices alone account for 15.3% of geospatial queries, nearly twice the size of the entire physical geography theme. Much of this activity - costs, opening hours, contact details, weather, travel recommendations - falls outside the scope traditional GIS systems and knowledge graphs are built to serve. The categories vary substantially in the kind of answer they admit, from deterministic lookups answerable from spatial databases or knowledge graphs to evaluative or temporally volatile queries that require generative or real-time systems. We discuss implications for hybrid retrieval architectures and for benchmarks of geographic reasoning in large language models. We openly release the labelled dataset, classifier, and taxonomy.

CVFeb 18Code
A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification

Qi You, Yitai Cheng, Zichao Zeng et al.

Street-view image attribute classification is a vital downstream task of image classification, enabling applications such as autonomous driving, urban analytics, and high-definition map construction. It remains computationally demanding whether training from scratch, initialising from pre-trained weights, or fine-tuning large models. Although pre-trained vision-language models such as CLIP offer rich image representations, existing adaptation or fine-tuning methods often rely on their global image embeddings, limiting their ability to capture fine-grained, localised attributes essential in complex, cluttered street scenes. To address this, we propose CLIP-MHAdapter, a variant of the current lightweight CLIP adaptation paradigm that appends a bottleneck MLP equipped with multi-head self-attention operating on patch tokens to model inter-patch dependencies. With approximately 1.4 million trainable parameters, CLIP-MHAdapter achieves superior or competitive accuracy across eight attribute classification tasks on the Global StreetScapes dataset, attaining new state-of-the-art results while maintaining low computational cost. The code is available at https://github.com/SpaceTimeLab/CLIP-MHAdapter.

HCMar 15
The Scenic Route to Deception: Dark Patterns and Explainability Pitfalls in Conversational Navigation

Ilya Ilyankou, Stefano Cavazzi, James Haworth

As pedestrian navigation increasingly experiments with Generative AI, and in particular Large Language Models, the nature of routing risks transforming from a verifiable geometric task into an opaque, persuasive dialogue. While conversational interfaces promise personalisation, they introduce risks of manipulation and misplaced trust. We categorise these risks using a 2x2 framework based on intent and origin, distinguishing between intentional manipulations (dark patterns) and unintended harms (explainability pitfalls). We propose seamful design strategies to mitigate these harms. We suggest that one robust way to operationalise trustworthy conversational navigation is through neuro-symbolic architecture, where verifiable pathfinding algorithms ground GenAI's persuasive capabilities, ensuring systems explain their limitations and incentives as clearly as they explain the route.

CLApr 5, 2024
Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

Ilya Ilyankou, Aldo Lipani, Stefano Cavazzi et al.

Sentence transformers are language models designed to perform semantic search. This study investigates the capacity of sentence transformers, fine-tuned on general question-answering datasets for asymmetric semantic search, to associate descriptions of human-generated routes across Great Britain with queries often used to describe hiking experiences. We find that sentence transformers have some zero-shot capabilities to understand quasi-geospatial concepts, such as route types and difficulty, suggesting their potential utility for routing recommendation systems.

CLJun 7, 2024
Quantifying Geospatial in the Common Crawl Corpus

Ilya Ilyankou, Meihui Wang, Stefano Cavazzi et al.

Large language models (LLMs) exhibit emerging geospatial capabilities, stemming from their pre-training on vast unlabelled text datasets that are often derived from the Common Crawl (CC) corpus. However, the geospatial content within CC remains largely unexplored, impacting our understanding of LLMs' spatial reasoning. This paper investigates the prevalence of geospatial data in recent Common Crawl releases using Gemini 1.5, a powerful language model. By analyzing a sample of documents and manually revising the results, we estimate that 18.7% of web documents in CC contain geospatial information such as coordinates and addresses. We find little difference in prevalence between Enlgish- and non-English-language documents. Our findings provide quantitative insights into the nature and extent of geospatial data in CC, and lay the groundwork for future studies of geospatial biases of LLMs.

CVApr 8, 2021
Re-designing cities with conditional adversarial networks

Mohamed R. Ibrahim, James Haworth, Nicola Christie

This paper introduces a conditional generative adversarial network to redesign a street-level image of urban scenes by generating 1) an urban intervention policy, 2) an attention map that localises where intervention is needed, 3) a high-resolution street-level image (1024 X 1024 or 1536 X1536) after implementing the intervention. We also introduce a new dataset that comprises aligned street-level images of before and after urban interventions from real-life scenarios that make this research possible. The introduced method has been trained on different ranges of urban interventions applied to realistic images. The trained model shows strong performance in re-modelling cities, outperforming existing methods that apply image-to-image translation in other domains that is computed in a single GPU. This research opens the door for machine intelligence to play a role in re-thinking and re-designing the different attributes of cities based on adversarial learning, going beyond the mainstream of facial landmarks manipulation or image synthesis from semantic segmentation.

CVOct 22, 2019
WeatherNet: Recognising weather and visual conditions from street-level images using deep residual learning

Mohamed R. Ibrahim, James Haworth, Tao Cheng

Extracting information related to weather and visual conditions at a given time and space is indispensable for scene awareness, which strongly impacts our behaviours, from simply walking in a city to riding a bike, driving a car, or autonomous drive-assistance. Despite the significance of this subject, it is still not been fully addressed by the machine intelligence relying on deep learning and computer vision to detect the multi-labels of weather and visual conditions with a unified method that can be easily used for practice. What has been achieved to-date is rather sectorial models that address limited number of labels that do not cover the wide spectrum of weather and visual conditions. Nonetheless, weather and visual conditions are often addressed individually. In this paper, we introduce a novel framework to automatically extract this information from street-level images relying on deep learning and computer vision using a unified method without any pre-defined constraints in the processed images. A pipeline of four deep Convolutional Neural Network (CNN) models, so-called the WeatherNet, is trained, relying on residual learning using ResNet50 architecture, to extract various weather and visual conditions such as Dawn/dusk, day and night for time detection, and glare for lighting conditions, and clear, rainy, snowy, and foggy for weather conditions. The WeatherNet shows strong performance in extracting this information from user-defined images or video streams that can be used not limited to: autonomous vehicles and drive-assistance systems, tracking behaviours, safety-related research, or even for better understanding cities through images for policy-makers.

CVSep 10, 2018
URBAN-i: From urban scenes to mapping slums, transport modes, and pedestrians in cities using deep learning and computer vision

Mohamed R. Ibrahim, James Haworth, Tao Cheng

Within the burgeoning expansion of deep learning and computer vision across the different fields of science, when it comes to urban development, deep learning and computer vision applications are still limited towards the notions of smart cities and autonomous vehicles. Indeed, a wide gap of knowledge appears when it comes to cities and urban regions in less developed countries where the chaos of informality is the dominant scheme. How can deep learning and Artificial Intelligence (AI) untangle the complexities of informality to advance urban modelling and our understanding of cities? Various questions and debates can be raised concerning the future of cities of the North and the South in the paradigm of AI and computer vision. In this paper, we introduce a new method for multipurpose realistic-dynamic urban modelling relying on deep learning and computer vision, using deep Convolutional Neural Networks (CNN), to sense and detect informality and slums in urban scenes from aerial and street view images in addition to detection of pedestrian and transport modes. The model has been trained on images of urban scenes in cities across the globe. The model shows a good validation of understanding a wide spectrum of nuances among the planned and the unplanned regions, including informal and slum areas. We attempt to advance urban modelling for better understanding the dynamics of city developments. We also aim to exemplify the significant impacts of AI in cities beyond how smart cities are discussed and perceived in the mainstream. The algorithms of the URBAN-i model are fully-coded in Python programming with the pre-trained deep learning models to be used as a tool for mapping and city modelling in the various corner of the globe, including informal settlements and slum regions.

CYAug 14, 2018
predictSLUMS: A new model for identifying and predicting informal settlements and slums in cities from street intersections using machine learning

Mohamed R. Ibrahim, Helena Titheridge, Tao Cheng et al.

Identifying current and future informal regions within cities remains a crucial issue for policymakers and governments in developing countries. The delineation process of identifying such regions in cities requires a lot of resources. While there are various studies that identify informal settlements based on satellite image classification, relying on both supervised or unsupervised machine learning approaches, these models either require multiple input data to function or need further development with regards to precision. In this paper, we introduce a novel method for identifying and predicting informal settlements using only street intersections data, regardless of the variation of urban form, number of floors, materials used for construction or street width. With such minimal input data, we attempt to provide planners and policy-makers with a pragmatic tool that can aid in identifying informal zones in cities. The algorithm of the model is based on spatial statistics and a machine learning approach, using Multinomial Logistic Regression (MNL) and Artificial Neural Networks (ANN). The proposed model relies on defining informal settlements based on two ubiquitous characteristics that these regions tend to be filled in with smaller subdivided lots of housing relative to the formal areas within the local context, and the paucity of services and infrastructure within the boundary of these settlements that require relatively bigger lots. We applied the model in five major cities in Egypt and India that have spatial structures in which informality is present. These cities are Greater Cairo, Alexandria, Hurghada and Minya in Egypt, and Mumbai in India. The predictSLUMS model shows high validity and accuracy for identifying and predicting informality within the same city the model was trained on or in different ones of a similar context.