GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
This work addresses the challenge of capturing normalcy in mobility data for trajectory analytics, positioning it as a foundational modality for AI, though it is incremental in applying transformer-based methods to a specific domain.
The paper tackled the problem of modeling human movement patterns from GPS trajectories by introducing GPS-MTM, a foundation model that decomposes mobility into states and actions using self-supervised learning, achieving consistent outperformance on tasks like trajectory infilling and next-stop prediction across benchmark datasets.
Foundation models have driven remarkable progress in text, vision, and video understanding, and are now poised to unlock similar breakthroughs in trajectory modeling. We introduce the GPSMasked Trajectory Transformer (GPS-MTM), a foundation model for large-scale mobility data that captures patterns of normalcy in human movement. Unlike prior approaches that flatten trajectories into coordinate streams, GPS-MTM decomposes mobility into two complementary modalities: states (point-of-interest categories) and actions (agent transitions). Leveraging a bi-directional Transformer with a self-supervised masked modeling objective, the model reconstructs missing segments across modalities, enabling it to learn rich semantic correlations without manual labels. Across benchmark datasets, including Numosim-LA, Urban Anomalies, and Geolife, GPS-MTM consistently outperforms on downstream tasks such as trajectory infilling and next-stop prediction. Its advantages are most pronounced in dynamic tasks (inverse and forward dynamics), where contextual reasoning is critical. These results establish GPS-MTM as a robust foundation model for trajectory analytics, positioning mobility data as a first-class modality for large-scale representation learning. Code is released for further reference.