Large Trajectory Models are Scalable Motion Predictors and Planners
This work addresses motion prediction and planning for autonomous driving, presenting a novel method that unifies trajectory generation with sequence modeling, though it is incremental in leveraging scaling from language models.
The paper tackles motion prediction and planning in autonomous driving by introducing a scalable trajectory model called State Transformer (STR), which reformulates these tasks as unified sequence modeling and shows that large trajectory models adhere to scaling laws with outstanding adaptability and learning efficiency, capable of plausible predictions in out-of-distribution scenarios and complex long-term reasoning without explicit loss designs.
Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task. Our approach unites trajectory generation problems with other sequence modeling problems, powering rapid iterations with breakthroughs in neighbor domains such as language modeling. Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency. Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations.