Hex2vec -- Context-Aware Embedding H3 Hexagons with OpenStreetMap Tags
This work addresses the need for context-aware embeddings of geographic data for urban planning and analysis, but it is incremental as it adapts existing NLP methods to a new domain.
The paper tackled the problem of learning vector representations for urban regions based on OpenStreetMap tags to detect similarities and infer land-use patterns, resulting in semantic structures akin to language models and enabling region typology through clustering.
Representation learning of spatial and geographic data is a rapidly developing field which allows for similarity detection between areas and high-quality inference using deep neural networks. Past approaches however concentrated on embedding raster imagery (maps, street or satellite photos), mobility data or road networks. In this paper we propose the first approach to learning vector representations of OpenStreetMap regions with respect to urban functions and land-use in a micro-region grid. We identify a subset of OSM tags related to major characteristics of land-use, building and urban region functions, types of water, green or other natural areas. Through manual verification of tagging quality, we selected 36 cities were for training region representations. Uber's H3 index was used to divide the cities into hexagons, and OSM tags were aggregated for each hexagon. We propose the hex2vec method based on the Skip-gram model with negative sampling. The resulting vector representations showcase semantic structures of the map characteristics, similar to ones found in vector-based language models. We also present insights from region similarity detection in six Polish cities and propose a region typology obtained through agglomerative clustering.