Enriching Location Representation with Detailed Semantic Information
This work addresses the need for enriched spatial representations in urban modeling, offering incremental improvements for researchers and practitioners in urban planning and AI.
The paper tackled the problem of traditional spatial embeddings underutilizing fine-grained contextual information in urban environments by introducing CaLLiPer+, which integrates POI names and categorical labels in a multimodal contrastive learning framework, resulting in performance gains of 4% to 11% on land use classification and socioeconomic status mapping tasks.
Spatial representations that capture both structural and semantic characteristics of urban environments are essential for urban modeling. Traditional spatial embeddings often prioritize spatial proximity while underutilizing fine-grained contextual information from places. To address this limitation, we introduce CaLLiPer+, an extension of the CaLLiPer model that systematically integrates Point-of-Interest (POI) names alongside categorical labels within a multimodal contrastive learning framework. We evaluate its effectiveness on two downstream tasks, land use classification and socioeconomic status distribution mapping, demonstrating consistent performance gains of 4% to 11% over baseline methods. Additionally, we show that incorporating POI names enhances location retrieval, enabling models to capture complex urban concepts with greater precision. Ablation studies further reveal the complementary role of POI names and the advantages of leveraging pretrained text encoders for spatial representations. Overall, our findings highlight the potential of integrating fine-grained semantic attributes and multimodal learning techniques to advance the development of urban foundation models.