CVNov 29, 2023

A natural language processing-based approach: mapping human perception by understanding deep semantic features in street view images

arXiv:2311.17354v15 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the challenge of comprehensively understanding human perception in urban science, offering improved explanatory power for spatial heterogeneity, though it is incremental as it builds on existing datasets and methods.

The study tackled the problem of measuring human perception from street view images by moving beyond shallow image features to incorporate deep semantic features using a natural language processing approach, achieving better performance than previous machine learning methods with shallow features.

In the past decade, using Street View images and machine learning to measure human perception has become a mainstream research approach in urban science. However, this approach using only image-shallow information makes it difficult to comprehensively understand the deep semantic features of human perception of a scene. In this study, we proposed a new framework based on a pre-train natural language model to understand the relationship between human perception and the sense of a scene. Firstly, Place Pulse 2.0 was used as our base dataset, which contains a variety of human-perceived labels, namely, beautiful, safe, wealthy, depressing, boring, and lively. An image captioning network was used to extract the description information of each street view image. Secondly, a pre-trained BERT model was finetuning and added a regression function for six human perceptual dimensions. Furthermore, we compared the performance of five traditional regression methods with our approach and conducted a migration experiment in Hong Kong. Our results show that human perception scoring by deep semantic features performed better than previous studies by machine learning methods with shallow features. The use of deep scene semantic features provides new ideas for subsequent human perception research, as well as better explanatory power in the face of spatial heterogeneity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes