TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis
This work addresses the challenge of semantic GPS trajectory analysis for geospatial AI applications, representing an incremental advance through multimodal integration.
The authors tackled the problem of extracting deep semantic representations from GPS trajectory data by proposing TrajSceneLLM, a multimodal framework that integrates map images and LLM-generated textual descriptions to create trajectory scene embeddings, achieving significant performance improvement in Travel Mode Identification.
GPS trajectory data reveals valuable patterns of human mobility and urban dynamics, supporting a variety of spatial applications. However, traditional methods often struggle to extract deep semantic representations and incorporate contextual map information. We propose TrajSceneLLM, a multimodal perspective for enhancing semantic understanding of GPS trajectories. The framework integrates visualized map images (encoding spatial context) and textual descriptions generated through LLM reasoning (capturing temporal sequences and movement dynamics). Separate embeddings are generated for each modality and then concatenated to produce trajectory scene embeddings with rich semantic content which are further paired with a simple MLP classifier. We validate the proposed framework on Travel Mode Identification (TMI), a critical task for analyzing travel choices and understanding mobility behavior. Our experiments show that these embeddings achieve significant performance improvement, highlighting the advantage of our LLM-driven method in capturing deep spatio-temporal dependencies and reducing reliance on handcrafted features. This semantic enhancement promises significant potential for diverse downstream applications and future research in geospatial artificial intelligence. The source code and dataset are publicly available at: https://github.com/februarysea/TrajSceneLLM.