SIApr 10, 2025
S2Vec: Self-Supervised Geospatial EmbeddingsShushman Choudhury, Elad Aharoni, Chandrakumari Suvarna et al.
Scalable general-purpose representations of the built environment are crucial for geospatial artificial intelligence applications. This paper introduces S2Vec, a novel self-supervised framework for learning such geospatial embeddings. S2Vec uses the S2 Geometry library to partition large areas into discrete S2 cells, rasterizes built environment feature vectors within cells as images, and applies masked autoencoding on these rasterized images to encode the feature vectors. This approach yields task-agnostic embeddings that capture local feature characteristics and broader spatial relationships. We evaluate S2Vec on three large-scale socioeconomic prediction tasks, showing its competitive performance against state-of-the-art image-based embeddings. We also explore the benefits of combining S2Vec embeddings with image-based embeddings downstream, showing that such multimodal fusion can often improve performance. Our results highlight how S2Vec can learn effective general-purpose geospatial representations and how it can complement other data modalities in geospatial artificial intelligence.
LGMay 9, 2024
Scalable Learning of Segment-Level Traffic Congestion FunctionsShushman Choudhury, Abdul Rahman Kreidieh, Iveel Tsogsuren et al.
We propose and study a data-driven framework for identifying traffic congestion functions (numerical relationships between observations of traffic variables) at global scale and segment-level granularity. In contrast to methods that estimate a separate set of parameters for each roadway, ours learns a single black-box function over all roadways in a metropolitan area. First, we pool traffic data from all segments into one dataset, combining static attributes with dynamic time-dependent features. Second, we train a feed-forward neural network on this dataset, which we can then use on any segment in the area. We evaluate how well our framework identifies congestion functions on observed segments and how it generalizes to unobserved segments and predicts segment attributes on a large dataset covering multiple cities worldwide. For identification error on observed segments, our single data-driven congestion function compares favorably to segment-specific model-based functions on highway roads, but has room to improve on arterial roads. For generalization, our approach shows strong performance across cities and road types: both on unobserved segments in the same city and on zero-shot transfer learning between cities. Finally, for predicting segment attributes, we find that our approach can approximate critical densities for individual segments using their static properties.