CVROMar 5, 2025

Unified Human Localization and Trajectory Prediction with Monocular Vision

arXiv:2503.03535v15 citationsh-index: 7Has CodeICRA
Originality Incremental advance
AI Analysis

This work addresses the need for robust human trajectory prediction in robotics applications, where conventional models overfit to clean data, by providing a unified method that handles noisy inputs, though it is incremental in combining existing tasks.

The paper tackles the problem of human trajectory prediction in noisy real-world settings by proposing MonoTransmotion, a Transformer-based framework that jointly performs localization and prediction using only a monocular camera, achieving around 12% improvement over baselines on curated data and maintaining performance on non-curated datasets.

Conventional human trajectory prediction models rely on clean curated data, requiring specialized equipment or manual labeling, which is often impractical for robotic applications. The existing predictors tend to overfit to clean observation affecting their robustness when used with noisy inputs. In this work, we propose MonoTransmotion (MT), a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks. Our framework has two main modules: Bird's Eye View (BEV) localization and trajectory prediction. The BEV localization module estimates the position of a person using 2D human poses, enhanced by a novel directional loss for smoother sequential localizations. The trajectory prediction module predicts future motion from these estimates. We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs. We validate our MT network on both curated and non-curated datasets. On the curated dataset, MT achieves around 12% improvement over baseline models on BEV localization and trajectory prediction. On real-world non-curated dataset, experimental results indicate that MT maintains similar performance levels, highlighting its robustness and generalization capability. The code is available at https://github.com/vita-epfl/MonoTransmotion.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes