AICYLGOct 10, 2025

Beyond AlphaEarth: Toward Human-Centered Spatial Representation via POI-Guided Contrastive Learning

arXiv:2510.09894v12 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the problem of integrating human-centered semantics into geospatial foundation models for urban analysis, representing an incremental advancement over existing EO-based methods.

The paper tackles the limitation of Earth Observation-driven spatial representations in capturing human-centered urban functions by proposing AETHER, a framework that enriches AlphaEarth embeddings with Points of Interest data, resulting in a 7.2% relative improvement in land-use classification F1 and a 23.6% relative reduction in Kullback-Leibler divergence for socioeconomic mapping in Greater London.

General-purpose spatial representations are essential for building transferable geospatial foundation models (GFMs). Among them, the AlphaEarth Foundation (AE) represents a major step toward a global, unified representation of the Earth's surface, learning 10-meter embeddings from multi-source Earth Observation (EO) data that capture rich physical and environmental patterns across diverse landscapes. However, such EO-driven representations remain limited in capturing the functional and socioeconomic dimensions of cities, as they primarily encode physical and spectral patterns rather than human activities or spatial functions. We propose AETHER (AlphaEarth-POI Enriched Representation Learning), a lightweight framework that adapts AlphaEarth to human-centered urban analysis through multimodal alignment guided by Points of Interest (POIs). AETHER aligns AE embeddings with textual representations of POIs, enriching physically grounded EO features with semantic cues about urban functions and socioeconomic contexts. In Greater London, AETHER achieves consistent gains over the AE baseline, with a 7.2% relative improvement in land-use classification F1 and a 23.6% relative reduction in Kullback-Leibler divergence for socioeconomic mapping. Built upon pretrained AE, AETHER leverages a lightweight multimodal alignment to enrich it with human-centered semantics while remaining computationally efficient and scalable for urban applications. By coupling EO with human-centered semantics, it advances geospatial foundation models toward general-purpose urban representations that integrate both physical form and functional meaning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes