An Open-Source Modular Benchmark for Diffusion-Based Motion Planning in Closed-Loop Autonomous Driving
This work addresses a critical gap for autonomous driving researchers and engineers by enabling more realistic and configurable testing of diffusion planners in production environments, though it is incremental in focusing on benchmarking rather than new algorithmic breakthroughs.
The paper tackled the lack of evaluation for diffusion-based motion planners in closed-loop autonomous driving systems by developing an open-source modular benchmark integrated into the Autoware stack, showing that encoder caching reduces latency by 3.2x and second-order solving improves FDE by 41% at N=3 compared to first-order.
Diffusion-based motion planners have achieved state-of-the-art results on benchmarks such as nuPlan, yet their evaluation within closed-loop production autonomous driving stacks remains largely unexplored. Existing evaluations abstract away ROS 2 communication latency and real-time scheduling constraints, while monolithic ONNX deployment freezes all solver parameters at export time. We present an open-source modular benchmark that addresses both gaps: using ONNX GraphSurgeon, we decompose a monolithic 18,398 node diffusion planner into three independently executable modules and reimplement the DPM-Solver++ denoising loop in native C++. Integrated as a ROS 2 node within Autoware, the open-source AD stack deployed on real vehicles worldwide, the system enables runtime-configurable solver parameters without model recompilation and per-step observability of the denoising process, breaking the black box of monolithic deployment. Unlike evaluations in standalone simulators such as CARLA, our benchmark operates within a production-grade stack and is validated through AWSIM closed-loop simulation. Through systematic comparison of DPM-Solver++ (first- and second-order) and DDIM across six step-count configurations (N in {3, 5, 7, 10, 15, 20}), we show that encoder caching yields a 3.2x latency reduction, and that second-order solving reduces FDE by 41% at N=3 compared to first-order. The complete codebase will be released as open-source, providing a direct path from simulation benchmarks to real-vehicle deployment.