CVROSep 28, 2025

Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning

arXiv:2509.23993v18 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses a critical challenge for autonomous driving technologies by improving simulation realism, though it appears incremental as it builds on existing fine-tuning methods.

The paper tackles the problem of distributional shift in multi-agent traffic simulation by proposing SMART-R1, an R1-style reinforcement fine-tuning paradigm, which achieves state-of-the-art performance with a realism meta score of 0.7858 on the Waymo Open Sim Agents Challenge.

Scalable and realistic simulation of multi-agent traffic behavior is critical for advancing autonomous driving technologies. Although existing data-driven simulators have made significant strides in this domain, they predominantly rely on supervised learning to align simulated distributions with real-world driving scenarios. A persistent challenge, however, lies in the distributional shift that arises between training and testing, which often undermines model generalization in unseen environments. To address this limitation, we propose SMART-R1, a novel R1-style reinforcement fine-tuning paradigm tailored for next-token prediction models to better align agent behavior with human preferences and evaluation metrics. Our approach introduces a metric-oriented policy optimization algorithm to improve distribution alignment and an iterative "SFT-RFT-SFT" training strategy that alternates between Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT) to maximize performance gains. Extensive experiments on the large-scale Waymo Open Motion Dataset (WOMD) validate the effectiveness of this simple yet powerful R1-style training framework in enhancing foundation models. The results on the Waymo Open Sim Agents Challenge (WOSAC) showcase that SMART-R1 achieves state-of-the-art performance with an overall realism meta score of 0.7858, ranking first on the leaderboard at the time of submission.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes