Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning
This work addresses the problem of efficient and precise trajectory planning for autonomous vehicles, offering incremental improvements in fidelity and control over existing generative models.
The paper tackles the challenge of generating diverse and plausible future trajectories for autonomous vehicle planning by introducing the Efficient Virtuoso, a conditional latent diffusion model with a two-stage normalization pipeline, achieving a state-of-the-art minADE of 0.25 on the Waymo Open Motion Dataset.
The ability to generate a diverse and plausible distribution of future trajectories is a critical capability for autonomous vehicle planning systems. While recent generative models have shown promise, achieving high fidelity, computational efficiency, and precise control remains a significant challenge. In this paper, we present the Efficient Virtuoso, a conditional latent diffusion model for goal-conditioned trajectory planning. Our approach introduces a novel two-stage normalization pipeline that first scales trajectories to preserve their geometric aspect ratio and then normalizes the resulting PCA latent space to ensure a stable training target. The denoising process is performed efficiently in this low-dimensional latent space by a simple MLP denoiser, which is conditioned on a rich scene context fused by a powerful Transformer-based StateEncoder. We demonstrate that our method achieves state-of-the-art performance on the Waymo Open Motion Dataset, achieving a minimum Average Displacement Error (minADE) of 0.25. Furthermore, through a rigorous ablation study on goal representation, we provide a key insight: while a single endpoint goal can resolve strategic ambiguity, a richer, multi-step sparse route is essential for enabling the precise, high-fidelity tactical execution that mirrors nuanced human driving behavior.