ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
This work addresses the challenge of improving planning accuracy in end-to-end autonomous driving systems, representing an incremental advancement by refining existing world model approaches.
The paper tackles the problem of redundant static region modeling and insufficient trajectory interaction in world models for autonomous driving by proposing a Temporal Residual World Model (TR-World) that focuses on dynamic object modeling and a Future-Guided Trajectory Refinement (FGTR) module, achieving state-of-the-art planning performance on nuScenes and NAVSIM datasets.
The comprehensive understanding capabilities of world models for driving scenarios have significantly improved the planning accuracy of end-to-end autonomous driving frameworks. However, the redundant modeling of static regions and the lack of deep interaction with trajectories hinder world models from exerting their full effectiveness. In this paper, we propose Temporal Residual World Model (TR-World), which focuses on dynamic object modeling. By calculating the temporal residuals of scene representations, the information of dynamic objects can be extracted without relying on detection and tracking. TR-World takes only temporal residuals as input, thus predicting the future spatial distribution of dynamic objects more precisely. By combining the prediction with the static object information contained in the current BEV features, accurate future BEV features can be obtained. Furthermore, we propose Future-Guided Trajectory Refinement (FGTR) module, which conducts interaction between prior trajectories (predicted from the current scene representation) and the future BEV features. This module can not only utilize future road conditions to refine trajectories, but also provides sparse spatial-temporal supervision on future BEV features to prevent world model collapse. Comprehensive experiments conducted on the nuScenes and NAVSIM datasets demonstrate that our method, namely ResWorld, achieves state-of-the-art planning performance. The code is available at https://github.com/mengtan00/ResWorld.git.