Addressing the Waypoint-Action Gap in End-to-End Autonomous Driving via Vehicle Motion Models
This addresses a key bottleneck in autonomous driving research by facilitating better comparison and training of action-based models, though it is incremental as it builds on existing benchmarks.
The paper tackles the gap between waypoint-based and action-based models in end-to-end autonomous driving by proposing a differentiable vehicle-model framework that enables action-based policies to be trained and evaluated within waypoint-based benchmarks, achieving state-of-the-art performance on NAVSIM navhard.
End-to-End Autonomous Driving (E2E-AD) systems are typically grouped by the nature of their outputs: (i) waypoint-based models that predict a future trajectory, and (ii) action-based models that directly output throttle, steer and brake. Most recent benchmark protocols and training pipelines are waypoint-based, which makes action-based policies harder to train and compare, slowing their progress. To bridge this waypoint-action gap, we propose a novel, differentiable vehicle-model framework that rolls out predicted action sequences to their corresponding ego-frame waypoint trajectories while supervising in waypoint space. Our approach enables action-based architectures to be trained and evaluated, for the first time, within waypoint-based benchmarks without modifying the underlying evaluation protocol. We extensively evaluate our framework across multiple challenging benchmarks and observe consistent improvements over the baselines. In particular, on NAVSIM \texttt{navhard} our approach achieves state-of-the-art performance. Our code will be made publicly available upon acceptance.