ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving
This addresses a fundamental optimization challenge in autonomous driving systems, offering a novel method to improve trajectory prediction and safety, though it is incremental in refining existing approaches.
The paper tackles the spatio-temporal imbalance in end-to-end autonomous driving trajectory data, which leads to suboptimal learning and safety issues, by proposing ResAD, a framework that predicts residual deviations from an inertial reference and uses point-wise normalization, achieving a state-of-the-art PDMS of 88.6 on the NAVSIM benchmark.
End-to-end autonomous driving (E2EAD) systems, which learn to predict future trajectories directly from sensor data, are fundamentally challenged by the inherent spatio-temporal imbalance of trajectory data. This imbalance creates a significant optimization burden, causing models to learn spurious correlations instead of causal inference, while also prioritizing uncertain, distant predictions, thereby compromising immediate safety. To address these issues, we propose ResAD, a novel Normalized Residual Trajectory Modeling framework. Instead of predicting the future trajectory directly, our approach reframes the learning task to predict the residual deviation from a deterministic inertial reference. The inertial reference serves as a counterfactual, forcing the model to move beyond simple pattern recognition and instead identify the underlying causal factors (e.g., traffic rules, obstacles) that necessitate deviations from a default, inertially-guided path. To deal with the optimization imbalance caused by uncertain, long-term horizons, ResAD further incorporates Point-wise Normalization of the predicted residual. It re-weights the optimization objective, preventing large-magnitude errors associated with distant, uncertain waypoints from dominating the learning signal. Extensive experiments validate the effectiveness of our framework. On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy with only two denoising steps, demonstrating that our approach significantly simplifies the learning task and improves model performance. The code will be released to facilitate further research.