RO CVOct 21, 2024

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving

Qiao Sun, Huimin Wang, Jiahao Zhan, Fan Nie, Xin Wen, Leimeng Xu, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

arXiv:2410.15774v215.726 citationsh-index: 13Has CodeICRA

Originality Incremental advance

AI Analysis

This work addresses generalization challenges in autonomous driving motion planning, offering a scalable solution that improves performance on complex and few-shot cases, though it is incremental as it builds on existing transformer and mixture-of-experts methods.

The paper tackles the limited generalization of data-driven motion planners in autonomous driving by introducing StateTransformer-2 (STR2), a scalable decoder-only planner with a Vision Transformer encoder and mixture-of-experts architecture, which shows better generalization on the NuPlan dataset and consistent accuracy improvements with scaling to billions of scenarios.

Large real-world driving datasets have sparked significant research into various aspects of data-driven motion planners for autonomous driving. These include data augmentation, model architecture, reward design, training strategies, and planner pipelines. These planners promise better generalizations on complicated and few-shot cases than previous methods. However, experiment results show that many of these approaches produce limited generalization abilities in planning performance due to overly complex designs or training paradigms. In this paper, we review and benchmark previous methods focusing on generalizations. The experimental results indicate that as models are appropriately scaled, many design elements become redundant. We introduce StateTransformer-2 (STR2), a scalable, decoder-only motion planner that uses a Vision Transformer (ViT) encoder and a mixture-of-experts (MoE) causal Transformer architecture. The MoE backbone addresses modality collapse and reward balancing by expert routing during training. Extensive experiments on the NuPlan dataset show that our method generalizes better than previous approaches across different test sets and closed-loop simulations. Furthermore, we assess its scalability on billions of real-world urban driving scenarios, demonstrating consistent accuracy improvements as both data and model size grow.

View on arXiv PDF Code

Similar