ROAILGJan 30

MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving

arXiv:2601.22930v1h-index: 6
Originality Incremental advance
AI Analysis

This addresses complex, long-tail scenarios in autonomous driving by enabling iterative refinement, though it is incremental as it builds on existing MLLM-RL integration.

The paper tackles the problem of trajectory planning in autonomous driving by introducing MTDrive, a multi-turn framework that integrates Multi-modal Large Language Models with Reinforcement Learning to iteratively refine trajectories, achieving superior performance on the NAVSIM benchmark and a 2.5x training throughput improvement.

Trajectory planning is a core task in autonomous driving, requiring the prediction of safe and comfortable paths across diverse scenarios. Integrating Multi-modal Large Language Models (MLLMs) with Reinforcement Learning (RL) has shown promise in addressing "long-tail" scenarios. However, existing methods are constrained to single-turn reasoning, limiting their ability to handle complex tasks requiring iterative refinement. To overcome this limitation, we present MTDrive, a multi-turn framework that enables MLLMs to iteratively refine trajectories based on environmental feedback. MTDrive introduces Multi-Turn Group Relative Policy Optimization (mtGRPO), which mitigates reward sparsity by computing relative advantages across turns. We further construct an interactive trajectory understanding dataset from closed-loop simulation to support multi-turn training. Experiments on the NAVSIM benchmark demonstrate superior performance compared to existing methods, validating the effectiveness of our multi-turn reasoning paradigm. Additionally, we implement system-level optimizations to reduce data transfer overhead caused by high-resolution images and multi-turn sequences, achieving 2.5x training throughput. Our data, models, and code will be made available soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes