DBMay 21

OSM+: Billion-Level OpenStreetMap Dataset for City-wide Experiments

arXiv:2512.0674343.1h-index: 4Has Code
Predicted impact top 34% in DB · last 90 daysOriginality Synthesis-oriented
AI Analysis

Provides a large-scale, standardized road network dataset for benchmarking graph learning models, addressing the lack of billion-scale real-world graphs in the field.

The authors created OSM+, a billion-vertex road network dataset from OpenStreetMap using distributed computing, enabling city-scale experiments. They demonstrate its utility with benchmarks for traffic prediction (31 cities) and policy control (6 cities), scaling from hundreds to thousands of intersections.

Road network data provides rich information about cities, but processing worldwide OpenStreetMap (OSM) data is computationally intensive, and the resulting graphs are often difficult to unify for benchmarking downstream tasks. Existing graph learning benchmarks fail to capture the billion-scale and unique topological properties of real-world road networks, leaving model scalability underexplored. To close this gap, we process OSM data with distributed cloud computing using 5,000 cores and release \textbf{OSM+}, a structured worldwide 1-billion-vertex road network graph dataset designed for high accessibility and usability. OSM+ is open source and globally downloadable, providing an open-box graph structure and an easy spatial query interface; the evaluated release is a fixed snapshot for reproducibility, with a versioned update plan for future releases. We demonstrate the utility of OSM+ through three illustrative use cases: city boundary detection, traffic prediction, and traffic policy control. For traffic prediction, we construct a new 31-city benchmark by processing traffic data and combining it with OSM+, enabling broader spatial coverage and more comprehensive evaluation than commonly used datasets, while scaling from hundreds of road network intersections to thousands. For traffic policy control, we release a new six-city dataset at a much larger scale, introducing challenges for thousand-scale multi-agent coordination. We also provide data processing tools for integrating multimodal spatial-temporal data with OSM+ for geospatial foundation model training, thereby expediting the discovery of compelling scientific insights.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes