RONov 21, 2020

Semantic-Based VPS for Smartphone Localization in Challenging Urban Environments

Max Jwo Lem Lee, Li-Ta Hsu, Hoi-Fung Ng, Shang Lee

arXiv:2011.10743v12.22 citations

Originality Highly original

AI Analysis

This work addresses the problem of accurate smartphone localization for IoT applications in urban canyons where GNSS fails, offering a substantial improvement for users in these challenging environments.

This paper proposes a novel semantic-based Visual Positioning System (VPS) that utilizes 3D city models and material-segmented images to localize smartphones in challenging urban environments. The system achieves 2.0m accuracy in high-rise streets, 5.5m in foliage-dense areas, and 15.7m in alleyways, demonstrating a 45% improvement over the current state-of-the-art method and an 8x improvement in yaw estimation (2.3°) compared to smartphone IMU.

Accurate smartphone-based outdoor localization system in deep urban canyons are increasingly needed for various IoT applications such as augmented reality, intelligent transportation, etc. The recently developed feature-based visual positioning system (VPS) by Google detects edges from smartphone images to match with pre-surveyed edges in their map database. As smart cities develop, the building information modeling (BIM) becomes widely available, which provides an opportunity for a new semantic-based VPS. This article proposes a novel 3D city model and semantic-based VPS for accurate and robust pose estimation in urban canyons where global navigation satellite system (GNSS) tends to fail. In the offline stage, a material segmented city model is used to generate segmented images. In the online stage, an image is taken with a smartphone camera that provides textual information about the surrounding environment. The approach utilizes computer vision algorithms to rectify and hand segment between the different types of material identified in the smartphone image. A semantic-based VPS method is then proposed to match the segmented generated images with the segmented smartphone image. Each generated image holds a pose that contains the latitude, longitude, altitude, yaw, pitch, and roll. The candidate with the maximum likelihood is regarded as the precise pose of the user. The positioning results achieves 2.0m level accuracy in common high rise along street, 5.5m in foliage dense environment and 15.7m in alleyway. A 45% positioning improvement to current state-of-the-art method. The estimation of yaw achieves 2.3° level accuracy, 8 times the improvement to smartphone IMU.

View on arXiv PDF

Similar