MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving
This addresses complex, long-tail scenarios in autonomous driving by enabling iterative refinement, though it is incremental as it builds on existing MLLM-RL integration.
The paper tackles the problem of trajectory planning in autonomous driving by introducing MTDrive, a multi-turn framework that integrates Multi-modal Large Language Models with Reinforcement Learning to iteratively refine trajectories, achieving superior performance on the NAVSIM benchmark and a 2.5x training throughput improvement.
Trajectory planning is a core task in autonomous driving, requiring the prediction of safe and comfortable paths across diverse scenarios. Integrating Multi-modal Large Language Models (MLLMs) with Reinforcement Learning (RL) has shown promise in addressing "long-tail" scenarios. However, existing methods are constrained to single-turn reasoning, limiting their ability to handle complex tasks requiring iterative refinement. To overcome this limitation, we present MTDrive, a multi-turn framework that enables MLLMs to iteratively refine trajectories based on environmental feedback. MTDrive introduces Multi-Turn Group Relative Policy Optimization (mtGRPO), which mitigates reward sparsity by computing relative advantages across turns. We further construct an interactive trajectory understanding dataset from closed-loop simulation to support multi-turn training. Experiments on the NAVSIM benchmark demonstrate superior performance compared to existing methods, validating the effectiveness of our multi-turn reasoning paradigm. Additionally, we implement system-level optimizations to reduce data transfer overhead caused by high-resolution images and multi-turn sequences, achieving 2.5x training throughput. Our data, models, and code will be made available soon.