CVAIJul 31, 2025

Fusion of Pervasive RF Data with Spatial Images via Vision Transformers for Enhanced Mapping in Smart Cities

arXiv:2508.03736v11 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses mapping accuracy issues for smart city applications like autonomous navigation, though it appears incremental as it builds on existing fusion methods with a new modality combination.

The paper tackles the problem of environment mapping in smart cities by integrating RF data with spatial images using a vision transformer-based architecture, achieving a macro IoU of 65.3% and outperforming baselines by significant margins.

Environment mapping is an important computing task for a wide range of smart city applications, including autonomous navigation, wireless network operations and extended reality environments. Conventional smart city mapping techniques, such as satellite imagery, LiDAR scans, and manual annotations, often suffer from limitations related to cost, accessibility and accuracy. Open-source mapping platforms have been widely utilized in artificial intelligence applications for environment mapping, serving as a source of ground truth. However, human errors and the evolving nature of real-world environments introduce biases that can negatively impact the performance of neural networks trained on such data. In this paper, we present a deep learning-based approach that integrates the DINOv2 architecture to improve building mapping by combining maps from open-source platforms with radio frequency (RF) data collected from multiple wireless user equipments and base stations. Our approach leverages a vision transformer-based architecture to jointly process both RF and map modalities within a unified framework, effectively capturing spatial dependencies and structural priors for enhanced mapping accuracy. For the evaluation purposes, we employ a synthetic dataset co-produced by Huawei. We develop and train a model that leverages only aggregated path loss information to tackle the mapping problem. We measure the results according to three performance metrics which capture different qualities: (i) The Jaccard index, also known as intersection over union (IoU), (ii) the Hausdorff distance, and (iii) the Chamfer distance. Our design achieves a macro IoU of 65.3%, significantly surpassing (i) the erroneous maps baseline, which yields 40.1%, (ii) an RF-only method from the literature, which yields 37.3%, and (iii) a non-AI fusion baseline that we designed which yields 42.2%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes