CVMar 9

AutoTraces: Autoregressive Trajectory Forecasting via Multimodal Large Language Models

arXiv:2603.07989v1
Predicted impact top 18% in CV · last 90 daysOriginality Highly original
AI Analysis

This work addresses the problem of robot trajectory forecasting in complex human environments, which is crucial for safe and efficient robot navigation.

This paper introduces AutoTraces, an autoregressive vision-language-trajectory model that uses large language models (LLMs) to forecast robot trajectories in human environments. It achieves state-of-the-art forecasting accuracy, especially for long-horizon predictions, and demonstrates strong cross-scene generalization.

We present AutoTraces, an autoregressive vision-language-trajectory model for robot trajectory forecasting in humam-populated environments, which harnesses the inherent reasoning capabilities of large language models (LLMs) to model complex human behaviors. In contrast to prior works that rely solely on textual representations, our key innovation lies in a novel trajectory tokenization scheme, which represents waypoints with point tokens as categorical and positional markers while encoding waypoint numerical values as corresponding point embeddings, seamlessly integrated into the LLM's space through a lightweight encoder-decoder architecture. This design preserves the LLM's native autoregressive generation mechanism while extending it to physical coordinate spaces, facilitates modeling of long-term interactions in trajectory data. We further introduce an automated chain-of-thought (CoT) generation mechanism that leverages a multimodal LLM to infer spatio-temporal relationships from visual observations and trajectory data, eliminating reliance on manual annotation. Through a two-stage training strategy, our AutoTraces achieves SOTA forecasting accuracy, particularly in long-horizon prediction, while exhibiting strong cross-scene generalization and supporting flexible-length forecasting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes