CLAISISep 26, 2025

Human Mobility Datasets Enriched With Contextual and Social Dimensions

arXiv:2510.02333v1h-index: 7Has Code
Originality Synthesis-oriented
AI Analysis

This resource provides enriched datasets for researchers in mobility analysis and semantic web applications, though it is incremental as it builds on existing GPS data with added features.

The authors tackled the problem of limited semantic richness in human mobility datasets by creating two publicly available datasets of GPS trajectories enriched with contextual layers and synthetic social media posts generated by LLMs, covering Paris and New York and supporting tasks like behavior modeling and mobility prediction.

In this resource paper, we present two publicly available datasets of semantically enriched human trajectories, together with the pipeline to build them. The trajectories are publicly available GPS traces retrieved from OpenStreetMap. Each dataset includes contextual layers such as stops, moves, points of interest (POIs), inferred transportation modes, and weather data. A novel semantic feature is the inclusion of synthetic, realistic social media posts generated by Large Language Models (LLMs), enabling multimodal and semantic mobility analysis. The datasets are available in both tabular and Resource Description Framework (RDF) formats, supporting semantic reasoning and FAIR data practices. They cover two structurally distinct, large cities: Paris and New York. Our open source reproducible pipeline allows for dataset customization, while the datasets support research tasks such as behavior modeling, mobility prediction, knowledge graph construction, and LLM-based applications. To our knowledge, our resource is the first to combine real-world movement, structured semantic enrichment, LLM-generated text, and semantic web compatibility in a reusable framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes